当前期刊: Genome Research Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Corrigendum: Murine single-cell RNA-seq reveals cell-identity- and tissue-specific trajectories of aging
    Genome Res. (IF 9.944) Pub Date : 2020-01-01
    Jacob C. Kimmel; Lolita Penland; Nimrod D. Rubinstein; David G. Hendrickson; David R. Kelley; Adam Z. Rosenthal

    Genome Research 29: 2088–2103 (2019)

    更新日期:2020-01-10
  • Corrigendum: Dynamics of cardiomyocyte transcriptome and chromatin landscape demarcates key events of heart development
    Genome Res. (IF 9.944) Pub Date : 2020-01-01
    Michal Pawlak; Katarzyna Z. Kedzierska; Maciej Migdal; Karim Abu Nahia; Jordan A. Ramilowski; Lukasz Bugajski; Kosuke Hashimoto; Aleksandra Marconi; Katarzyna Piwocka; Piero Carninci; Cecilia L. Winata

    Genome Research 29: 506–519 (2019)

    更新日期:2020-01-10
  • Reviewer Index, Volume 29, 2019
    Genome Res. (IF 9.944) Pub Date : 2019-12-01

    Abyzov, Alexej

    更新日期:2019-12-02
  • Reviewer Index, Volume 29, 2019.
    Genome Res. (IF 9.944) Pub Date : null

    更新日期:2019-11-01
  • Identifying gene function and module connections by the integration of multispecies expression compendia.
    Genome Res. (IF 9.944) Pub Date : 2019-11-23
    Hao Li,Daria Rukina,Fabrice P A David,Terytty Yang Li,Chang-Myung Oh,Arwen W Gao,Elena Katsyuba,Maroun Bou Sleiman,Andrea Komljenovic,Qingyao Huang,Robert W Williams,Marc Robinson-Rechavi,Kristina Schoonjans,Stephan Morgenthaler,Johan Auwerx

    The functions of many eukaryotic genes are still poorly understood. Here, we developed and validated a new method, termed GeneBridge, which is based on two linked approaches to impute gene function and bridge genes with biological processes. First, Gene-Module Association Determination (G-MAD) allows the annotation of gene function. Second, Module-Module Association Determination (M-MAD) allows predicting connectivity among modules. We applied the GeneBridge tools to large-scale multispecies expression compendia-1700 data sets with over 300,000 samples from human, mouse, rat, fly, worm, and yeast-collected in this study. G-MAD identifies novel functions of genes-for example, DDT in mitochondrial respiration and WDFY4 in T cell activation-and also suggests novel components for modules, such as for cholesterol biosynthesis. By applying G-MAD on data sets from respective tissues, tissue-specific functions of genes were identified-for instance, the roles of EHHADH in liver and kidney, as well as SLC6A1 in brain and liver. Using M-MAD, we identified a list of module-module associations, such as those between mitochondria and proteasome, mitochondria and histone demethylation, as well as ribosomes and lipid biosynthesis. The GeneBridge tools together with the expression compendia are available as an open resource, which will facilitate the identification of connections linking genes, modules, phenotypes, and diseases.

    更新日期:2019-11-01
  • Modeling Niemann-Pick disease type C in a human haploid cell line allows for patient variant characterization and clinical interpretation.
    Genome Res. (IF 9.944) Pub Date : 2019-11-23
    Steven Erwood,Reid A Brewer,Teija M I Bily,Eleonora Maino,Liangchi Zhou,Ronald D Cohn,Evgueni A Ivakine

    The accurate clinical interpretation of human sequence variation is foundational to personalized medicine. This remains a pressing challenge, however, as genome sequencing becomes routine and new functionally undefined variants rapidly accumulate. Here, we describe a platform for the rapid generation, characterization, and interpretation of genomic variants in haploid cells focusing on Niemann-Pick disease type C (NPC) as an example. NPC is a fatal neurodegenerative disorder characterized by a lysosomal accumulation of unesterified cholesterol and glycolipids. In 95% of cases, NPC is caused by mutations in the NPC1 gene, for which more than 200 unique disease-causing variants have been reported to date. Furthermore, the majority of patients with NPC are compound heterozygotes that often carry at least one private mutation, presenting a challenge for the characterization and classification of individual variants. Here, we have developed the first haploid cell model of NPC. This haploid cell model recapitulates the primary biochemical and molecular phenotypes typically found in patient-derived fibroblasts, illustrating its utility in modeling NPC. Additionally, we show the power of CRISPR/Cas9-mediated base editing in quickly and efficiently generating haploid cell models of individual patient variants in NPC. These models provide a platform for understanding the disease mechanisms underlying individual NPC1 variants while allowing for definitive clinical variant interpretation for NPC.

    更新日期:2019-11-01
  • Murine single-cell RNA-seq reveals cell-identity- and tissue-specific trajectories of aging.
    Genome Res. (IF 9.944) Pub Date : 2019-11-23
    Jacob C Kimmel,Lolita Penland,Nimrod D Rubinstein,David G Hendrickson,David R Kelley,Adam Z Rosenthal

    Aging is a pleiotropic process affecting many aspects of mammalian physiology. Mammals are composed of distinct cell type identities and tissue environments, but the influence of these cell identities and environments on the trajectory of aging in individual cells remains unclear. Here, we performed single-cell RNA-seq on >50,000 individual cells across three tissues in young and old mice to allow for direct comparison of aging phenotypes across cell types. We found transcriptional features of aging common across many cell types, as well as features of aging unique to each type. Leveraging matrix factorization and optimal transport methods, we found that both cell identities and tissue environments exert influence on the trajectory and magnitude of aging, with cell identity influence predominating. These results suggest that aging manifests with unique directionality and magnitude across the diverse cell identities in mammals.

    更新日期:2019-11-01
  • The C. elegans 3' UTRome v2 resource for studying mRNA cleavage and polyadenylation, 3'-UTR biology, and miRNA targeting.
    Genome Res. (IF 9.944) Pub Date : 2019-11-21
    Hannah S Steber,Christina Gallante,Shannon O'Brien,Po-Lin Chiu,Marco Mangone

    3' Untranslated regions (3' UTRs) of mRNAs emerged as central regulators of cellular function because they contain important but poorly characterized cis-regulatory elements targeted by a multitude of regulatory factors. The model nematode Caenorhabditis elegans is ideal to study these interactions because it possesses a well-defined 3' UTRome. To improve its annotation, we have used a genome-wide bioinformatics approach to download raw transcriptome data for 1088 transcriptome data sets corresponding to the entire collection of C. elegans trancriptomes from 2015 to 2018 from the Sequence Read Archive at the NCBI. We then extracted and mapped high-quality 3'-UTR data at ultradeep coverage. Here, we describe and release to the community the updated version of the worm 3' UTRome, which we named 3' UTRome v2. This resource contains high-quality 3'-UTR data mapped at single-base ultraresolution for 23,084 3'-UTR isoform variants corresponding to 14,788 protein-coding genes and is updated to the latest release of WormBase. We used this data set to study and probe principles of mRNA cleavage and polyadenylation in C. elegans The worm 3' UTRome v2 represents the most comprehensive and high-resolution 3'-UTR data set available in C. elegans and provides a novel resource to investigate the mRNA cleavage and polyadenylation reaction, 3'-UTR biology, and miRNA targeting in a living organism.

    更新日期:2019-11-01
  • A high-resolution gene expression atlas links dedicated meristem genes to key architectural traits.
    Genome Res. (IF 9.944) Pub Date : 2019-11-21
    Steffen Knauer,Marie Javelle,Lin Li,Xianran Li,Xiaoli Ma,Kokulapalan Wimalanathan,Sunita Kumari,Robyn Johnston,Samuel Leiboff,Robert Meeley,Patrick S Schnable,Doreen Ware,Carolyn Lawrence-Dill,Jianming Yu,Gary J Muehlbauer,Michael J Scanlon,Marja C P Timmermans

    The shoot apical meristem (SAM) orchestrates the balance between stem cell proliferation and organ initiation essential for postembryonic shoot growth. Meristems show a striking diversity in shape and size. How this morphological diversity relates to variation in plant architecture and the molecular circuitries driving it are unclear. By generating a high-resolution gene expression atlas of the vegetative maize shoot apex, we show here that distinct sets of genes govern the regulation and identity of stem cells in maize versus Arabidopsis. Cell identities in the maize SAM reflect the combinatorial activity of transcription factors (TFs) that drive the preferential, differential expression of individual members within gene families functioning in a plethora of cellular processes. Subfunctionalization thus emerges as a fundamental feature underlying cell identity. Moreover, we show that adult plant characters are, to a significant degree, regulated by gene circuitries acting in the SAM, with natural variation modulating agronomically important architectural traits enriched specifically near dynamically expressed SAM genes and the TFs that regulate them. Besides unique mechanisms of maize stem cell regulation, our atlas thus identifies key new targets for crop improvement.

    更新日期:2019-11-01
  • Chromatin-sensitive cryptic promoters putatively drive expression of alternative protein isoforms in yeast.
    Genome Res. (IF 9.944) Pub Date : 2019-11-20
    Wu Wei,Bianca P Hennig,Jingwen Wang,Yujie Zhang,Ilaria Piazza,Yerma Pareja Sanchez,Christophe D Chabbert,Sophie H Adjalley,Lars M Steinmetz,Vicent Pelechano

    Cryptic transcription is widespread and generates a heterogeneous group of RNA molecules of unknown function. To improve our understanding of cryptic transcription, we investigated their transcription start site (TSS) usage, chromatin organization, and posttranscriptional consequences in Saccharomyces cerevisiae We show that TSSs of chromatin-sensitive internal cryptic transcripts retain comparable features of canonical TSSs in terms of DNA sequence, directionality, and chromatin accessibility. We define the 5' and 3' boundaries of cryptic transcripts and show that, contrary to RNA degradation-sensitive ones, they often overlap with the end of the gene, thereby using the canonical polyadenylation site, and associate to polyribosomes. We show that chromatin-sensitive cryptic transcripts can be recognized by ribosomes and may produce truncated polypeptides from downstream, in-frame start codons. Finally, we confirm the presence of the predicted polypeptides by reanalyzing N-terminal proteomic data sets. Our work suggests that a fraction of chromatin-sensitive internal cryptic promoters initiates the transcription of alternative truncated mRNA isoforms. The expression of these chromatin-sensitive isoforms is conserved from yeast to human, expanding the functional consequences of cryptic transcription and proteome complexity.

    更新日期:2019-11-01
  • Promoter-specific dynamics of TATA-binding protein association with the human genome.
    Genome Res. (IF 9.944) Pub Date : 2019-11-17
    Yuko Hasegawa,Kevin Struhl

    Transcription factor binding to target sites in vivo is a dynamic process that involves cycles of association and dissociation, with individual proteins differing in their binding dynamics. The dynamics at individual sites on a genomic scale have been investigated in yeast cells, but comparable experiments have not been done in multicellular eukaryotes. Here, we describe a tamoxifen-inducible, time-course ChIP-seq approach to measure transcription factor binding dynamics at target sites throughout the human genome. As observed in yeast cells, the TATA-binding protein (TBP) typically displays rapid turnover at RNA polymerase (Pol) II-transcribed promoters, slow turnover at Pol III promoters, and very slow turnover at the Pol I promoter. Turnover rates vary widely among Pol II promoters in a manner that does not correlate with the level of TBP occupancy. Human Pol II promoters with slow TBP dissociation preferentially contain a TATA consensus motif, support high transcriptional activity of downstream genes, and are linked with specific activators and chromatin remodelers. These properties of human promoters with slow TBP turnover differ from those of yeast promoters with slow turnover. These observations suggest that TBP binding dynamics differentially affect promoter function and gene expression, possibly at the level of transcriptional reinitiation/bursting.

    更新日期:2019-11-01
  • Deep profiling and custom databases improve detection of proteoforms generated by alternative splicing.
    Genome Res. (IF 9.944) Pub Date : 2019-11-16
    Laura M Agosto,Matthew R Gazzara,Caleb M Radens,Simone Sidoli,Josue Baeza,Benjamin A Garcia,Kristen W Lynch

    Alternative pre-mRNA splicing has long been proposed to contribute greatly to proteome complexity. However, the extent to which mature mRNA isoforms are successfully translated into protein remains controversial. Here, we used high-throughput RNA sequencing and mass spectrometry (MS)-based proteomics to better evaluate the translation of alternatively spliced mRNAs. To increase proteome coverage and improve protein quantitation, we optimized cell fractionation and sample processing steps at both the protein and peptide level. Furthermore, we generated a custom peptide database trained on analysis of RNA-seq data with MAJIQ, an algorithm optimized to detect and quantify differential and unannotated splice junction usage. We matched tandem mass spectra acquired by data-dependent acquisition (DDA) against our custom RNA-seq based database, as well as SWISS-PROT and RefSeq databases to improve identification of splicing-derived proteoforms by 28% compared with use of the SWISS-PROT database alone. Altogether, we identified peptide evidence for 554 alternate proteoforms corresponding to 274 genes. Our increased depth and detection of proteins also allowed us to track changes in the transcriptome and proteome induced by T-cell stimulation, as well as fluctuations in protein subcellular localization. In sum, our data here confirm that use of generic databases in proteomic studies underestimates the number of spliced mRNA isoforms that are translated into protein and provides a workflow that improves isoform detection in large-scale proteomic experiments.

    更新日期:2019-11-01
  • Clonal copy-number mosaicism in autoreactive T lymphocytes in diabetic NOD mice.
    Genome Res. (IF 9.944) Pub Date : 2019-11-07
    Maha Alriyami,Luc Marchand,Quan Li,Xiaoyu Du,Martin Olivier,Constantin Polychronakos

    Concordance for type 1 diabetes (T1D) is far from 100% in monozygotic twins and in inbred nonobese diabetic (NOD) mice, despite genetic identity and shared environment during incidence peak years. This points to stochastic determinants, such as postzygotic mutations (PZMs) in the expanding antigen-specific autoreactive T cell lineages, by analogy to their role in the expanding tumor lineage in cancer. Using comparative genomic hybridization of DNA from pancreatic lymph-node memory CD4+ T cells of 25 diabetic NOD mice, we found lymphocyte-exclusive mosaic somatic copy-number aberrations (CNAs) with highly nonrandom independent involvement of the same gene(s) across different mice, some with an autoimmunity association (e.g., Ilf3 and Dgka). We confirmed genes of interest using the gold standard approach for CNA quantification, multiplex ligation-dependent probe amplification (MLPA), as an independent method. As controls, we examined lymphocytes expanded during normal host defense (17 NOD and BALB/c mice infected with Leishmania major parasite). Here, CNAs found were fewer and significantly smaller compared to those in autoreactive cells (P = 0.0019). We determined a low T cell clonality for our samples suggesting a prethymic formation of these CNAs. In this study, we describe a novel, unexplored phenomenon of a potential causal contribution of PZMs in autoreactive T cells in T1D pathogenesis. We expect that exploration of point mutations and studies in human T cells will enable the further delineation of driver genes to target for functional studies. Our findings challenge the classical notions of autoimmunity and open conceptual avenues toward individualized prevention and therapeutics.

    更新日期:2019-11-01
  • AIDE: annotation-assisted isoform discovery with high precision.
    Genome Res. (IF 9.944) Pub Date : 2019-11-07
    Wei Vivian Li,Shan Li,Xin Tong,Ling Deng,Hubing Shi,Jingyi Jessica Li

    Genome-wide accurate identification and quantification of full-length mRNA isoforms is crucial for investigating transcriptional and posttranscriptional regulatory mechanisms of biological phenomena. Despite continuing efforts in developing effective computational tools to identify or assemble full-length mRNA isoforms from second-generation RNA-seq data, it remains a challenge to accurately identify mRNA isoforms from short sequence reads owing to the substantial information loss in RNA-seq experiments. Here, we introduce a novel statistical method, annotation-assisted isoform discovery (AIDE), the first approach that directly controls false isoform discoveries by implementing the testing-based model selection principle. Solving the isoform discovery problem in a stepwise and conservative manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. We evaluate the performance of AIDE based on multiple simulated and real RNA-seq data sets followed by PCR-Sanger sequencing validation. Our results show that AIDE effectively leverages the annotation information to compensate the information loss owing to short read lengths. AIDE achieves the highest precision in isoform discovery and the lowest error rates in isoform abundance estimation, compared with three state-of-the-art methods Cufflinks, SLIDE, and StringTie. As a robust bioinformatics tool for transcriptome analysis, AIDE enables researchers to discover novel transcripts with high confidence.

    更新日期:2019-11-01
  • Absolute nucleosome occupancy map for the Saccharomyces cerevisiae genome.
    Genome Res. (IF 9.944) Pub Date : 2019-11-07
    Elisa Oberbeckmann,Michael Wolff,Nils Krietenstein,Mark Heron,Jessica L Ellins,Andrea Schmid,Stefan Krebs,Helmut Blum,Ulrich Gerland,Philipp Korber

    Mapping of nucleosomes, the basic DNA packaging unit in eukaryotes, is fundamental for understanding genome regulation because nucleosomes modulate DNA access by their positioning along the genome. A cell-population nucleosome map requires two observables: nucleosome positions along the DNA ("Where?") and nucleosome occupancies across the population ("In how many cells?"). All available genome-wide nucleosome mapping techniques are yield methods because they score either nucleosomal (e.g., MNase-seq, chemical cleavage-seq) or nonnucleosomal (e.g., ATAC-seq) DNA but lose track of the total DNA population for each genomic region. Therefore, they only provide nucleosome positions and maybe compare relative occupancies between positions, but cannot measure absolute nucleosome occupancy, which is the fraction of all DNA molecules occupied at a given position and time by a nucleosome. Here, we established two orthogonal and thereby cross-validating approaches to measure absolute nucleosome occupancy across the Saccharomyces cerevisiae genome via restriction enzymes and DNA methyltransferases. The resulting high-resolution (9-bp) map shows uniform absolute occupancies. Most nucleosome positions are occupied in most cells: 97% of all nucleosomes called by chemical cleavage-seq have a mean absolute occupancy of 90 ± 6% (±SD). Depending on nucleosome position calling procedures, there are 57,000 to 60,000 nucleosomes per yeast cell. The few low absolute occupancy nucleosomes do not correlate with highly transcribed gene bodies, but correlate with increased presence of the nucleosome-evicting chromatin structure remodeling (RSC) complex, and are enriched upstream of highly transcribed or regulated genes. Our work provides a quantitative method and reference frame in absolute terms for future chromatin studies.

    更新日期:2019-11-01
  • Network-based hierarchical population structure analysis for large genomic data sets.
    Genome Res. (IF 9.944) Pub Date : 2019-11-07
    Gili Greenbaum,Amir Rubin,Alan R Templeton,Noah A Rosenberg

    Analysis of population structure in natural populations using genetic data is a common practice in ecological and evolutionary studies. With large genomic data sets of populations now appearing more frequently across the taxonomic spectrum, it is becoming increasingly possible to reveal many hierarchical levels of structure, including fine-scale genetic clusters. To analyze these data sets, methods need to be appropriately suited to the challenges of extracting multilevel structure from whole-genome data. Here, we present a network-based approach for constructing population structure representations from genetic data. The use of community-detection algorithms from network theory generates a natural hierarchical perspective on the representation that the method produces. The method is computationally efficient, and it requires relatively few assumptions regarding the biological processes that underlie the data. We show the approach by analyzing population structure in the model plant species Arabidopsis thaliana and in human populations. These examples illustrate how network-based approaches for population structure analysis are well-suited to extracting valuable ecological and evolutionary information in the era of large genomic data sets.

    更新日期:2019-11-01
  • Synchronized replication of genes encoding the same protein complex in fast-proliferating cells.
    Genome Res. (IF 9.944) Pub Date : 2019-10-31
    Ying Chen,Ke Li,Xiao Chu,Lucas B Carey,Wenfeng Qian

    DNA replication perturbs the dosage balance among genes; at mid-S phase, early-replicating genes have doubled their copies while late-replicating ones have not. Dosage imbalance among genes, especially within members of a protein complex, is toxic to cells. However, the molecular mechanisms that cells use to deal with such imbalance remain not fully understood. Here, we validate at the genomic scale that the dosage between early- and late-replicating genes is imbalanced in HeLa cells. We propose the synchronized replication hypothesis that genes sensitive to stoichiometric relationships will be replicated simultaneously to maintain stoichiometry. In support of this hypothesis, we observe that genes encoding the same protein complex have similar replication timing but mainly in fast-proliferating cells such as embryonic stem cells and cancer cells. We find that the synchronized replication observed in cancer cells, but not in slow-proliferating differentiated cells, is due to convergent evolution during tumorigenesis that restores synchronized replication timing within protein complexes. Taken together, our study reveals that the demand for dosage balance during S phase plays an important role in the optimization of the replication-timing program; this selection is relaxed during differentiation as the cell cycle prolongs and is restored during tumorigenesis as the cell cycle shortens.

    更新日期:2019-11-01
  • Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.
    Genome Res. (IF 9.944) Pub Date : 2019-09-21
    Jonathan M Mudge,Irwin Jungreis,Toby Hunt,Jose Manuel Gonzalez,James C Wright,Mike Kay,Claire Davidson,Stephen Fitzgerald,Ruth Seal,Susan Tweedie,Liang He,Robert M Waterhouse,Yue Li,Elspeth Bruford,Jyoti S Choudhary,Adam Frankish,Manolis Kellis

    The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.

    更新日期:2019-11-01
  • Accessibility of promoter DNA is not the primary determinant of chromatin-mediated gene regulation.
    Genome Res. (IF 9.944) Pub Date : 2019-09-13
    Răzvan V Chereji,Peter R Eriksson,Josefina Ocampo,Hemant K Prajapati,David J Clark

    DNA accessibility is thought to be of major importance in regulating gene expression. We test this hypothesis using a restriction enzyme as a probe of chromatin structure and as a proxy for transcription factors. We measured the digestion rate and the fraction of accessible DNA at almost all genomic AluI sites in budding yeast and mouse liver nuclei. Hepatocyte DNA is more accessible than yeast DNA, consistent with longer linkers between nucleosomes, suggesting that nucleosome spacing is a major determinant of accessibility. DNA accessibility varies from cell to cell, such that essentially no sites are accessible or inaccessible in every cell. AluI sites in inactive mouse promoters are accessible in some cells, implying that transcription factors could bind without activating the gene. Euchromatin and heterochromatin have very similar accessibilities, suggesting that transcription factors can penetrate heterochromatin. Thus, DNA accessibility is not likely to be the primary determinant of gene regulation.

    更新日期:2019-11-01
  • methyl-ATAC-seq measures DNA methylation at accessible chromatin.
    Genome Res. (IF 9.944) Pub Date : 2019-06-05
    Roman Spektor,Nathaniel D Tippens,Claudia A Mimoso,Paul D Soloway

    Chromatin features are characterized by genome-wide assays for nucleosome location, protein binding sites, three-dimensional interactions, and modifications to histones and DNA. For example, assay for transposase accessible chromatin sequencing (ATAC-seq) identifies nucleosome-depleted (open) chromatin, which harbors potentially active gene regulatory sequences; and bisulfite sequencing (BS-seq) quantifies DNA methylation. When two distinct chromatin features like these are assayed separately in populations of cells, it is impossible to determine, with certainty, where the features are coincident in the genome by simply overlaying data sets. Here, we describe methyl-ATAC-seq (mATAC-seq), which implements modifications to ATAC-seq, including subjecting the output to BS-seq. Merging these assays into a single protocol identifies the locations of open chromatin and reveals, unambiguously, the DNA methylation state of the underlying DNA. Such combinatorial methods eliminate the need to perform assays independently and infer where features are coincident.

    更新日期:2019-11-01
  • A massively parallel 3' UTR reporter assay reveals relationships between nucleotide content, sequence conservation, and mRNA destabilization.
    Genome Res. (IF 9.944) Pub Date : 2019-06-04
    Adam J Litterman,Robin Kageyama,Olivier Le Tonqueze,Wenxue Zhao,John D Gagnon,Hani Goodarzi,David J Erle,K Mark Ansel

    Compared to coding sequences, untranslated regions of the transcriptome are not well conserved, and functional annotation of these sequences is challenging. Global relationships between nucleotide composition of 3' UTR sequences and their sequence conservation have been appreciated since mammalian genomes were first sequenced, but the functional relevance of these patterns remain unknown. We systematically measured the effect on gene expression of the sequences of more than 25,000 RNA-binding protein (RBP) binding sites in primary mouse T cells using a massively parallel reporter assay. GC-rich sequences were destabilizing of reporter mRNAs and come from more rapidly evolving regions of the genome. These sequences were more likely to be folded in vivo and contain a number of structural motifs that reduced accumulation of a heterologous reporter protein. Comparison of full-length 3' UTR sequences across vertebrate phylogeny revealed that strictly conserved 3' UTRs were GC-poor and enriched in genes associated with organismal development. In contrast, rapidly evolving 3' UTRs tended to be GC-rich and derived from genes involved in metabolism and immune responses. Cell-essential genes had lower GC content in their 3' UTRs, suggesting a connection between unstructured mRNA noncoding sequences and optimal protein production. By reducing gene expression, GC-rich RBP-occupied sequences act as a rapidly evolving substrate for gene regulatory interactions.

    更新日期:2019-11-01
  • Pangolin genomes and the evolution of mammalian scales and immunity.
    Genome Res. (IF 9.944) Pub Date : 2016-08-12
    Siew Woh Choo,Mike Rayko,Tze King Tan,Ranjeev Hari,Aleksey Komissarov,Wei Yee Wee,Andrey A Yurchenko,Sergey Kliver,Gaik Tamazian,Agostinho Antunes,Richard K Wilson,Wesley C Warren,Klaus-Peter Koepfli,Patrick Minx,Ksenia Krasheninnikova,Antoinette Kotze,Desire L Dalton,Elaine Vermaak,Ian C Paterson,Pavel Dobrynin,Frankie Thomas Sitam,Jeffrine J Rovie-Ryan,Warren E Johnson,Aini Mohamed Yusoff,Shu-Jin Luo,Kayal Vizi Karuppannan,Gang Fang,Deyou Zheng,Mark B Gerstein,Leonard Lipovich,Stephen J O'Brien,Guat Jah Wong

    Pangolins, unique mammals with scales over most of their body, no teeth, poor vision, and an acute olfactory system, comprise the only placental order (Pholidota) without a whole-genome map. To investigate pangolin biology and evolution, we developed genome assemblies of the Malayan (Manis javanica) and Chinese (M. pentadactyla) pangolins. Strikingly, we found that interferon epsilon (IFNE), exclusively expressed in epithelial cells and important in skin and mucosal immunity, is pseudogenized in all African and Asian pangolin species that we examined, perhaps impacting resistance to infection. We propose that scale development was an innovation that provided protection against injuries or stress and reduced pangolin vulnerability to infection. Further evidence of specialized adaptations was evident from positively selected genes involving immunity-related pathways, inflammation, energy storage and metabolism, muscular and nervous systems, and scale/hair development. Olfactory receptor gene families are significantly expanded in pangolins, reflecting their well-developed olfaction system. This study provides insights into mammalian adaptation and functional diversification, new research tools and questions, and perhaps a new natural IFNE-deficient animal model for studying mammalian immunity.

    更新日期:2019-11-01
  • Diversification and collapse of a telomere elongation mechanism.
    Genome Res. (IF 9.944) Pub Date : 2019-05-30
    Bastien Saint-Leandre,Son C Nguyen,Mia T Levine

    In most eukaryotes, telomerase counteracts chromosome erosion by adding repetitive sequence to terminal ends. Drosophila melanogaster instead relies on specialized retrotransposons that insert exclusively at telomeres. This exchange of goods between host and mobile element-wherein the mobile element provides an essential genome service and the host provides a hospitable niche for mobile element propagation-has been called a "genomic symbiosis." However, these telomere-specialized, jockey family retrotransposons may actually evolve to "selfishly" overreplicate in the genomes that they ostensibly serve. Under this model, we expect rapid diversification of telomere-specialized retrotransposon lineages and, possibly, the breakdown of this ostensibly symbiotic relationship. Here we report data consistent with both predictions. Searching the raw reads of the 15-Myr-old melanogaster species group, we generated de novo jockey retrotransposon consensus sequences and used phylogenetic tree-building to delineate four distinct telomere-associated lineages. Recurrent gains, losses, and replacements account for this retrotransposon lineage diversity. In Drosophila biarmipes, telomere-specialized elements have disappeared completely. De novo assembly of long reads and cytogenetics confirmed this species-specific collapse of retrotransposon-dependent telomere elongation. Instead, telomere-restricted satellite DNA and DNA transposon fragments occupy its terminal ends. We infer that D. biarmipes relies instead on a recombination-based mechanism conserved from yeast to flies to humans. Telomeric retrotransposon diversification and disappearance suggest that persistently "selfish" machinery shapes telomere elongation across Drosophila rather than completely domesticated, symbiotic mobile elements.

    更新日期:2019-11-01
  • Systems analysis reveals complex biological processes during virus infection fate decisions.
    Genome Res. (IF 9.944) Pub Date : 2019-05-30
    Jordi Argilaguet,Mireia Pedragosa,Anna Esteve-Codina,Graciela Riera,Enric Vidal,Cristina Peligero-Cruz,Valentina Casella,David Andreu,Tsuneyasu Kaisho,Gennady Bocharov,Burkhard Ludewig,Simon Heath,Andreas Meyerhans

    The processes and mechanisms of virus infection fate decisions that are the result of a dynamic virus-immune system interaction with either an efficient effector response and virus elimination or an alleviated immune response and chronic infection are poorly understood. Here, we characterized the host response to acute and chronic lymphocytic choriomeningitis virus (LCMV) infections by gene coexpression network analysis of time-resolved splenic transcriptomes. First, we found an early attenuation of inflammatory monocyte/macrophage prior to the onset of T cell exhaustion, and second, a critical role of the XCL1-XCR1 communication axis during the functional adaptation of the T cell response to the chronic infection state. These findings not only reveal an important feedback mechanism that couples T cell exhaustion with the maintenance of a lower level of effector T cell response but also suggest therapy options to better control virus levels during the chronic infection phase.

    更新日期:2019-11-01
  • One minute analysis of 200 histone posttranslational modifications by direct injection mass spectrometry.
    Genome Res. (IF 9.944) Pub Date : 2019-05-28
    Simone Sidoli,Yekaterina Kori,Mariana Lopes,Zuo-Fei Yuan,Hee Jong Kim,Katarzyna Kulej,Kevin A Janssen,Laura M Agosto,Julia Pinheiro Chagas da Cunha,Andrew J Andrews,Benjamin A Garcia

    DNA and histone proteins define the structure and composition of chromatin. Histone posttranslational modifications (PTMs) are covalent chemical groups capable of modeling chromatin accessibility, mostly due to their ability in recruiting enzymes responsible for DNA readout and remodeling. Mass spectrometry (MS)-based proteomics is the methodology of choice for large-scale identification and quantification of protein PTMs, including histones. High sensitivity proteomics requires online MS coupling with relatively low throughput and poorly robust nano-liquid chromatography (nanoLC) and, for histone proteins, a 2-d sample preparation that includes histone purification, derivatization, and digestion. We present a new protocol that achieves quantitative data on about 200 histone PTMs from tissue or cell lines in 7 h from start to finish. This protocol includes 4 h of histone extraction, 3 h of derivatization and digestion, and only 1 min of MS analysis via direct injection (DI-MS). We demonstrate that this sample preparation can be parallelized for 384 samples by using multichannel pipettes and 96-well plates. We also engineered the sequence of a synthetic "histone-like" peptide to spike into the sample, of which derivatization and digestion benchmarks the quality of the sample preparation. We ensure that DI-MS does not introduce biases in histone peptide ionization as compared to nanoLC-MS/MS by producing and analyzing a library of synthetically modified histone peptides mixed in equal molarity. Finally, we introduce EpiProfileLite for comprehensive analysis of this new data type. Altogether, our workflow is suitable for high-throughput screening of >1000 samples per day using a single mass spectrometer.

    更新日期:2019-11-01
  • Human contamination in bacterial genomes has created thousands of spurious proteins.
    Genome Res. (IF 9.944) Pub Date : 2019-05-09
    Florian P Breitwieser,Mihaela Pertea,Aleksey V Zimin,Steven L Salzberg

    Contaminant sequences that appear in published genomes can cause numerous problems for downstream analyses, particularly for evolutionary studies and metagenomics projects. Our large-scale scan of complete and draft bacterial and archaeal genomes in the NCBI RefSeq database reveals that 2250 genomes are contaminated by human sequence. The contaminant sequences derive primarily from high-copy human repeat regions, which themselves are not adequately represented in the current human reference genome, GRCh38. The absence of the sequences from the human assembly offers a likely explanation for their presence in bacterial assemblies. In some cases, the contaminating contigs have been erroneously annotated as containing protein-coding sequences, which over time have propagated to create spurious protein "families" across multiple prokaryotic and eukaryotic genomes. As a result, 3437 spurious protein entries are currently present in the widely used nr and TrEMBL protein databases. We report here an extensive list of contaminant sequences in bacterial genome assemblies and the proteins associated with them. We found that nearly all contaminants occurred in small contigs in draft genomes, which suggests that filtering out small contigs from draft genome assemblies may mitigate the issue of contamination while still keeping nearly all of the genuine genomic sequences.

    更新日期:2019-11-01
  • Plasmid detection and assembly in genomic and metagenomic data sets.
    Genome Res. (IF 9.944) Pub Date : 2019-05-03
    Dmitry Antipov,Mikhail Raiko,Alla Lapidus,Pavel A Pevzner

    Although plasmids are important for bacterial survival and adaptation, plasmid detection and assembly from genomic, let alone metagenomic, samples remain challenging. The recently developed plasmidSPAdes assembler addressed some of these challenges in the case of isolate genomes but stopped short of detecting plasmids in metagenomic assemblies, an untapped source of yet to be discovered plasmids. We present the metaplasmidSPAdes tool for plasmid assembly in metagenomic data sets that reduced the false positive rate of plasmid detection compared with the state-of-the-art approaches. We assembled plasmids in diverse data sets and have shown that thousands of plasmids remained below the radar in already completed genomic and metagenomic studies. Our analysis revealed the extreme variability of plasmids and has led to the discovery of many novel plasmids (including many plasmids carrying antibiotic-resistance genes) without significant similarities to currently known ones.

    更新日期:2019-11-01
  • Convergent recombination cessation between mating-type genes and centromeres in selfing anther-smut fungi.
    Genome Res. (IF 9.944) Pub Date : 2019-05-03
    Fantin Carpentier,Ricardo C Rodríguez de la Vega,Sara Branco,Alodie Snirc,Marco A Coelho,Michael E Hood,Tatiana Giraud

    The degree of selfing has major impacts on adaptability and is often controlled by molecular mechanisms determining mating compatibility. Changes in compatibility systems are therefore important evolutionary events, but their underlying genomic mechanisms are often poorly understood. Fungi display frequent shifts in compatibility systems, and their small genomes facilitate elucidation of the mechanisms involved. In particular, linkage between the pre- and postmating compatibility loci has evolved repeatedly, increasing the odds of gamete compatibility under selfing. Here, we studied the mating-type chromosomes of two anther-smut fungi with unlinked mating-type loci despite a self-fertilization mating system. Segregation analyses and comparisons of high-quality genome assemblies revealed that these two species displayed linkage between mating-type loci and their respective centromeres. This arrangement renders the same improved odds of gamete compatibility as direct linkage of the two mating-type loci under the automictic mating (intratetrad selfing) of anther-smut fungi. Recombination cessation was found associated with a large inversion in only one of the four linkage events. The lack of trans-specific polymorphism at genes located in nonrecombining regions and linkage date estimates indicated that the events of recombination cessation occurred independently in the two sister species. Our study shows that natural selection can repeatedly lead to similar genomic patterns and phenotypes, and that different evolutionary paths can lead to distinct yet equally beneficial responses to selection. Our study further highlights that automixis and gene linkage to centromeres have important genetic and evolutionary consequences, while being poorly recognized despite being present in a broad range of taxa.

    更新日期:2019-11-01
  • Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid.
    Genome Res. (IF 9.944) Pub Date : 2019-04-18
    Steve Miller,Samia N Naccache,Erik Samayoa,Kevin Messacar,Shaun Arevalo,Scot Federman,Doug Stryke,Elizabeth Pham,Becky Fung,William J Bolosky,Danielle Ingebrigtsen,Walter Lorizio,Sandra M Paff,John A Leake,Rick Pesano,Roberta DeBiasi,Samuel Dominguez,Charles Y Chiu

    Metagenomic next-generation sequencing (mNGS) for pan-pathogen detection has been successfully tested in proof-of-concept case studies in patients with acute illness of unknown etiology but to date has been largely confined to research settings. Here, we developed and validated a clinical mNGS assay for diagnosis of infectious causes of meningitis and encephalitis from cerebrospinal fluid (CSF) in a licensed microbiology laboratory. A customized bioinformatics pipeline, SURPI+, was developed to rapidly analyze mNGS data, generate an automated summary of detected pathogens, and provide a graphical user interface for evaluating and interpreting results. We established quality metrics, threshold values, and limits of detection of 0.2-313 genomic copies or colony forming units per milliliter for each representative organism type. Gross hemolysis and excess host nucleic acid reduced assay sensitivity; however, spiked phages used as internal controls were reliable indicators of sensitivity loss. Diagnostic test accuracy was evaluated by blinded mNGS testing of 95 patient samples, revealing 73% sensitivity and 99% specificity compared to original clinical test results, and 81% positive percent agreement and 99% negative percent agreement after discrepancy analysis. Subsequent mNGS challenge testing of 20 positive CSF samples prospectively collected from a cohort of pediatric patients hospitalized with meningitis, encephalitis, and/or myelitis showed 92% sensitivity and 96% specificity relative to conventional microbiological testing of CSF in identifying the causative pathogen. These results demonstrate the analytic performance of a laboratory-validated mNGS assay for pan-pathogen detection, to be used clinically for diagnosis of neurological infections from CSF.

    更新日期:2019-11-01
  • Structural variants in 3000 rice genomes.
    Genome Res. (IF 9.944) Pub Date : 2019-04-18
    Roven Rommel Fuentes,Dmytro Chebotarov,Jorge Duitama,Sean Smith,Juan Fernando De la Hoz,Marghoob Mohiyuddin,Rod A Wing,Kenneth L McNally,Tatiana Tatarinova,Andrey Grigoriev,Ramil Mauleon,Nickolai Alexandrov

    Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5' UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.

    更新日期:2019-11-01
  • ATAC-seq reveals regional differences in enhancer accessibility during the establishment of spatial coordinates in the Drosophila blastoderm.
    Genome Res. (IF 9.944) Pub Date : 2019-04-10
    Marta Bozek,Roberto Cortini,Andrea Ennio Storti,Ulrich Unnerstall,Ulrike Gaul,Nicolas Gompel

    Establishment of spatial coordinates during Drosophila embryogenesis relies on differential regulatory activity of axis patterning enhancers. Concentration gradients of activator and repressor transcription factors (TFs) provide positional information to each enhancer, which in turn promotes transcription of a target gene in a specific spatial pattern. However, the interplay between an enhancer regulatory activity and its accessibility as determined by local chromatin organization is not well understood. We profiled chromatin accessibility with ATAC-seq in narrow, genetically tagged domains along the antero-posterior axis in the Drosophila blastoderm. We demonstrate that one-quarter of the accessible genome displays significant regional variation in its ATAC-seq signal immediately after zygotic genome activation. Axis patterning enhancers are enriched among the most variable intervals, and their accessibility changes correlate with their regulatory activity. In an embryonic domain where an enhancer receives a net activating TF input and promotes transcription, it displays elevated accessibility in comparison to a domain where it receives a net repressive input. We propose that differential accessibility is a signature of patterning cis-regulatory elements in the Drosophila blastoderm and discuss potential mechanisms by which accessibility of enhancers may be modulated by activator and repressor TFs.

    更新日期:2019-11-01
  • Identification of a primitive intestinal transcription factor network shared between esophageal adenocarcinoma and its precancerous precursor state.
    Genome Res. (IF 9.944) Pub Date : 2019-04-10
    Connor Rogerson,Edward Britton,Sarah Withey,Neil Hanley,Yeng S Ang,Andrew D Sharrocks

    Esophageal adenocarcinoma (EAC) is one of the most frequent causes of cancer death, and yet compared to other common cancers, we know relatively little about the molecular composition of this tumor type. To further our understanding of this cancer, we have used open chromatin profiling to decipher the transcriptional regulatory networks that are operational in EAC. We have uncovered a transcription factor network that is usually found in primitive intestinal cells during embryonic development, centered on HNF4A and GATA6. These transcription factors work together to control the EAC transcriptome. We show that this network is activated in Barrett's esophagus, the putative precursor state to EAC, thereby providing novel molecular evidence in support of stepwise malignant transition. Furthermore, we show that HNF4A alone is sufficient to drive chromatin opening and activation of a Barrett's-like chromatin signature when expressed in normal human epithelial cells. Collectively, these data provide a new way to categorize EAC at a genome scale and implicate HNF4A activation as a potential pivotal event in its malignant transition from healthy cells.

    更新日期:2019-11-01
  • Interplay between coding and exonic splicing regulatory sequences.
    Genome Res. (IF 9.944) Pub Date : 2019-04-10
    Nicolas Fontrodona,Fabien Aubé,Jean-Baptiste Claude,Hélène Polvèche,Sébastien Lemaire,Léon-Charles Tranchevent,Laurent Modolo,Franck Mortreux,Cyril F Bourgeois,Didier Auboeuf

    The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.

    更新日期:2019-11-01
  • DNA (de)methylation in embryonic stem cells controls CTCF-dependent chromatin boundaries.
    Genome Res. (IF 9.944) Pub Date : 2019-04-06
    Laura Wiehle,Graeme J Thorn,Günter Raddatz,Christopher T Clarkson,Karsten Rippe,Frank Lyko,Achim Breiling,Vladimir B Teif

    Coordinated changes of DNA (de)methylation, nucleosome positioning, and chromatin binding of the architectural protein CTCF play an important role for establishing cell-type-specific chromatin states during differentiation. To elucidate molecular mechanisms that link these processes, we studied the perturbed DNA modification landscape in mouse embryonic stem cells (ESCs) carrying a double knockout (DKO) of the Tet1 and Tet2 dioxygenases. These enzymes are responsible for the conversion of 5-methylcytosine (5mC) into its hydroxymethylated (5hmC), formylated (5fC), or carboxylated (5caC) forms. We determined changes in nucleosome positioning, CTCF binding, DNA methylation, and gene expression in DKO ESCs and developed biophysical models to predict differential CTCF binding. Methylation-sensitive nucleosome repositioning accounted for a significant portion of CTCF binding loss in DKO ESCs, whereas unmethylated and nucleosome-depleted CpG islands were enriched for CTCF sites that remained occupied. A number of CTCF sites also displayed direct correlations with the CpG modification state: CTCF was preferentially lost from sites that were marked with 5hmC in wild-type (WT) cells but not from 5fC-enriched sites. In addition, we found that some CTCF sites can act as bifurcation points defining the differential methylation landscape. CTCF loss from such sites, for example, at promoters, boundaries of chromatin loops, and topologically associated domains (TADs), was correlated with DNA methylation/demethylation spreading and can be linked to down-regulation of neighboring genes. Our results reveal a hierarchical interplay between cytosine modifications, nucleosome positions, and DNA sequence that determines differential CTCF binding and regulates gene expression.

    更新日期:2019-11-01
  • Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly.
    Genome Res. (IF 9.944) Pub Date : 2019-04-04
    Ou Wang,Robert Chin,Xiaofang Cheng,Michelle Ka Yan Wu,Qing Mao,Jingbo Tang,Yuhui Sun,Ellis Anderson,Han K Lam,Dan Chen,Yujun Zhou,Linying Wang,Fei Fan,Yan Zou,Yinlong Xie,Rebecca Yu Zhang,Snezana Drmanac,Darlene Nguyen,Chongjun Xu,Christian Villarosa,Scott Gablenz,Nina Barua,Staci Nguyen,Wenlan Tian,Jia Sophie Liu,Jingwan Wang,Xiao Liu,Xiaojuan Qi,Ao Chen,He Wang,Yuliang Dong,Wenwei Zhang,Andrei Alexeev,Huanming Yang,Jian Wang,Karsten Kristiansen,Xun Xu,Radoje Drmanac,Brock A Peters

    Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

    更新日期:2019-11-01
  • A new approach for rare variation collapsing on functional protein domains implicates specific genic regions in ALS.
    Genome Res. (IF 9.944) Pub Date : 2019-04-04
    Sahar Gelfman,Sarah Dugger,Cristiane de Araujo Martins Moreno,Zhong Ren,Charles J Wolock,Neil A Shneider,Hemali Phatnani,Elizabeth T Cirulli,Brittany N Lasseigne,Tim Harris,Tom Maniatis,Guy A Rouleau,Robert H Brown,Aaron D Gitler,Richard M Myers,Slavé Petrovski,Andrew Allen,David B Goldstein,Matthew B Harms

    Large-scale sequencing efforts in amyotrophic lateral sclerosis (ALS) have implicated novel genes using gene-based collapsing methods. However, pathogenic mutations may be concentrated in specific genic regions. To address this, we developed two collapsing strategies: One focuses rare variation collapsing on homology-based protein domains as the unit for collapsing, and the other is a gene-level approach that, unlike standard methods, leverages existing evidence of purifying selection against missense variation on said domains. The application of these two collapsing methods to 3093 ALS cases and 8186 controls of European ancestry, and also 3239 cases and 11,808 controls of diversified populations, pinpoints risk regions of ALS genes, including SOD1, NEK1, TARDBP, and FUS While not clearly implicating novel ALS genes, the new analyses not only pinpoint risk regions in known genes but also highlight candidate genes as well.

    更新日期:2019-11-01
  • The accessible chromatin landscape of the murine hippocampus at single-cell resolution.
    Genome Res. (IF 9.944) Pub Date : 2019-04-03
    John R Sinnamon,Kristof A Torkenczy,Michael W Linhoff,Sarah A Vitak,Ryan M Mulqueen,Hannah A Pliner,Cole Trapnell,Frank J Steemers,Gail Mandel,Andrew C Adey

    Here we present a comprehensive map of the accessible chromatin landscape of the mouse hippocampus at single-cell resolution. Substantial advances of this work include the optimization of a single-cell combinatorial indexing assay for transposase accessible chromatin (sci-ATAC-seq); a software suite, scitools, for the rapid processing and visualization of single-cell combinatorial indexing data sets; and a valuable resource of hippocampal regulatory networks at single-cell resolution. We used sci-ATAC-seq to produce 2346 high-quality single-cell chromatin accessibility maps with a mean unique read count per cell of 29,201 from both fresh and frozen hippocampi, observing little difference in accessibility patterns between the preparations. By using this data set, we identified eight distinct major clusters of cells representing both neuronal and nonneuronal cell types and characterized the driving regulatory factors and differentially accessible loci that define each cluster. Within pyramidal neurons, we identified four major clusters, including CA1 and CA3 neurons, and three additional subclusters. We then applied a recently described coaccessibility framework, Cicero, which identified 146,818 links between promoters and putative distal regulatory DNA. Identified coaccessibility networks showed cell-type specificity, shedding light on key dynamic loci that reconfigure to specify hippocampal cell lineages. Lastly, we performed an additional sci-ATAC-seq preparation from cultured hippocampal neurons (899 high-quality cells, 43,532 mean unique reads) that revealed substantial alterations in their epigenetic landscape compared with nuclei from hippocampal tissue. This data set and accompanying analysis tools provide a new resource that can guide subsequent studies of the hippocampus.

    更新日期:2019-11-01
  • Analysis of 100 high-coverage genomes from a pedigreed captive baboon colony.
    Genome Res. (IF 9.944) Pub Date : 2019-03-31
    Jacqueline A Robinson,Saurabh Belsare,Shifra Birnbaum,Deborah E Newman,Jeannie Chan,Jeremy P Glenn,Betsy Ferguson,Laura A Cox,Jeffrey D Wall

    Baboons (genus Papio) are broadly studied in the wild and in captivity. They are widely used as a nonhuman primate model for biomedical studies, and the Southwest National Primate Research Center (SNPRC) at Texas Biomedical Research Institute has maintained a large captive baboon colony for more than 50 yr. Unlike other model organisms, however, the genomic resources for baboons are severely lacking. This has hindered the progress of studies using baboons as a model for basic biology or human disease. Here, we describe a data set of 100 high-coverage whole-genome sequences obtained from the mixed colony of olive (P. anubis) and yellow (P. cynocephalus) baboons housed at the SNPRC. These data provide a comprehensive catalog of common genetic variation in baboons, as well as a fine-scale genetic map. We show how the data can be used to learn about ancestry and admixture and to correct errors in the colony records. Finally, we investigated the consequences of inbreeding within the SNPRC colony and found clear evidence for increased rates of infant mortality and increased homozygosity of putatively deleterious alleles in inbred individuals.

    更新日期:2019-11-01
  • Brown rat demography reveals pre-commensal structure in eastern Asia before expansion into Southeast Asia.
    Genome Res. (IF 9.944) Pub Date : 2019-03-27
    Emily E Puckett,Jason Munshi-South

    Fossil evidence indicates that the globally distributed brown rat (Rattus norvegicus) originated in northern China and Mongolia. Historical records report the human-mediated invasion of rats into Europe in the 1500s, followed by global spread because of European imperialist activity during the 1600s-1800s. We analyzed 14 genomes representing seven previously identified evolutionary clusters, and tested alternative demographic models to infer patterns of range expansion, divergence times, and changes in effective population (N e) size for this globally important pest species. We observed three range expansions from the ancestral population that produced the Pacific (diverged ∼16.1 kya), eastern China (∼17.5 kya), and Southeast (SE) Asia (∼0.86 kya) lineages. Our model shows a rapid range expansion from SE Asia into the Middle East and then continued expansion into central Europe 788 yr ago (1227 AD). We observed declining N e within all brown rat lineages from 150-1 kya, reflecting population contractions during glacial cycles. N e increased since 1 kya in Asian and European, but not in Pacific, evolutionary clusters. Our results support the hypothesis that northern Asia was the ancestral range for brown rats. We suggest that southward human migration across China between the 800s-1550s AD resulted in the introduction of rats to SE Asia, from which they rapidly expanded via existing maritime trade routes. Finally, we discovered that North America was colonized separately on both the Atlantic and Pacific seaboards, by evolutionary clusters of vastly different ages and genomic diversity levels. Our results should stimulate discussions among historians and zooarcheologists regarding the relationship between humans and rats.

    更新日期:2019-11-01
  • Chromothripsis during telomere crisis is independent of NHEJ, and consistent with a replicative origin.
    Genome Res. (IF 9.944) Pub Date : 2019-03-16
    Kez Cleal,Rhiannon E Jones,Julia W Grimstead,Eric A Hendrickson,Duncan M Baird

    Telomere erosion, dysfunction, and fusion can lead to a state of cellular crisis characterized by large-scale genome instability. We investigated the impact of a telomere-driven crisis on the structural integrity of the genome by undertaking whole-genome sequence analyses of clonal populations of cells that had escaped crisis. Quantification of large-scale structural variants revealed patterns of rearrangement consistent with chromothripsis but formed in the absence of functional nonhomologous end-joining pathways. Rearrangements frequently consisted of short fragments with complex mutational patterns, with a repair topology that deviated from randomness showing preferential repair to local regions or exchange between specific loci. We find evidence of telomere involvement with an enrichment of fold-back inversions demarcating clusters of rearrangements. Our data suggest that chromothriptic rearrangements caused by a telomere crisis arise via a replicative repair process involving template switching.

    更新日期:2019-11-01
  • A virome-wide clonal integration analysis platform for discovering cancer viral etiology.
    Genome Res. (IF 9.944) Pub Date : 2019-03-16
    Xun Chen,Jason Kost,Arvis Sulovari,Nathalie Wong,Winnie S Liang,Jian Cao,Dawei Li

    Oncoviral infection is responsible for 12%-15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contamination and noncausal viruses complicate the process of identifying genuine oncoviruses. Here, we propose a novel strategy to address these challenges by performing virome-wide screening of early-stage clonal viral integrations. To implement this strategy, we developed VIcaller, a novel platform for identifying viral integrations that are derived from any characterized viruses and shared by a large proportion of tumor cells using whole-genome sequencing (WGS) data. The sensitivity and precision were confirmed with simulated and benchmark cancer data sets. By applying this platform to cancer WGS data sets with proven or speculated viral etiology, we newly identified or confirmed clonal integrations of hepatitis B virus (HBV), human papillomavirus (HPV), Epstein-Barr virus (EBV), and BK Virus (BKV), suggesting the involvement of these viruses in early stages of tumorigenesis in affected tumors, such as HBV in TERT and KMT2B (also known as MLL4) gene loci in liver cancer, HPV and BKV in bladder cancer, and EBV in non-Hodgkin's lymphoma. We also showed the capacity of VIcaller to identify integrations from some uncharacterized viruses. This is the first study to systematically investigate the strategy and method of virome-wide screening of clonal integrations to identify oncoviruses. Searching clonal viral integrations with our platform has the capacity to identify virus-caused cancers and discover cancer viral etiologies.

    更新日期:2019-11-01
  • Accurate analysis of genuine CRISPR editing events with ampliCan.
    Genome Res. (IF 9.944) Pub Date : 2019-03-10
    Kornel Labun,Xiaoge Guo,Alejandro Chavez,George Church,James A Gagnon,Eivind Valen

    We present ampliCan, an analysis tool for genome editing that unites highly precise quantification and visualization of genuine genome editing events. ampliCan features nuclease-optimized alignments, filtering of experimental artifacts, event-specific normalization, and off-target read detection and quantifies insertions, deletions, HDR repair, as well as targeted base editing. It is scalable to thousands of amplicon sequencing-based experiments from any genome editing experiment, including CRISPR. It enables automated integration of controls and accounts for biases at every step of the analysis. We benchmarked ampliCan on both real and simulated data sets against other leading tools, demonstrating that it outperformed all in the face of common confounding factors.

    更新日期:2019-11-01
  • Differences in firing efficiency, chromatin, and transcription underlie the developmental plasticity of the Arabidopsis DNA replication origins.
    Genome Res. (IF 9.944) Pub Date : 2019-03-09
    Joana Sequeira-Mendes,Zaida Vergara,Ramon Peiró,Jordi Morata,Irene Aragüez,Celina Costas,Raul Mendez-Giraldez,Josep M Casacuberta,Ugo Bastolla,Crisanto Gutierrez

    Eukaryotic genome replication depends on thousands of DNA replication origins (ORIs). A major challenge is to learn ORI biology in multicellular organisms in the context of growing organs to understand their developmental plasticity. We have identified a set of ORIs of Arabidopsis thaliana and their chromatin landscape at two stages of post-embryonic development. ORIs associate with multiple chromatin signatures including transcription start sites (TSS) but also proximal and distal regulatory regions and heterochromatin, where ORIs colocalize with retrotransposons. In addition, quantitative analysis of ORI activity led us to conclude that strong ORIs have high GC content and clusters of GGN trinucleotides. Development primarily influences ORI firing strength rather than ORI location. ORIs that preferentially fire at early developmental stages colocalize with GC-rich heterochromatin, but at later stages with transcribed genes, perhaps as a consequence of changes in chromatin features associated with developmental processes. Our study provides the set of ORIs active in an organism at the post-embryo stage that should allow us to study ORI biology in response to development, environment, and mutations with a quantitative approach. In a wider scope, the computational strategies developed here can be transferred to other eukaryotic systems.

    更新日期:2019-11-01
  • Single-pollen-cell sequencing for gamete-based phased diploid genome assembly in plants.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Dongqing Shi,Jun Wu,Haibao Tang,Hao Yin,Hongtao Wang,Ran Wang,Runze Wang,Ming Qian,Juyou Wu,Kaijie Qi,Zhihua Xie,Zhiwen Wang,Xiang Zhao,Shaoling Zhang

    Genome assemblies from diploid organisms create mosaic sequences alternating between parental alleles, which can create erroneous gene models and other problems. In animals, a popular strategy to generate haploid genome-resolved assemblies has been the sampling of (haploid) gametes, and the advent of single-cell sequencing has further advanced such methods. However, several challenges for the isolation and amplification of DNA from plant gametes have limited such approaches in plants. Here, we combined a new approach for pollen protoplast isolation with a single-cell DNA amplification technique and then used a "barcoding" bioinformatics strategy to incorporate haploid-specific sequence data from 12 pollen cells, ultimately enabling the efficient and accurate phasing of the pear genome into its A and B haploid genomes. Beyond revealing that 8.12% of the genes in the pear reference genome feature mosaic assemblies and enabling a previously impossible analysis of allelic affects in pear gene expression, our new haploid genome assemblies provide high-resolution information about recombination during meiosis in pollen. Considering that outcrossing pear is an angiosperm species featuring very high heterozygosity, our method for rapidly phasing genome assemblies is potentially applicable to several yet-unsequenced outcrossing angiosperm species in nature.

    更新日期:2019-11-01
  • A chromosome-level assembly of the Atlantic herring genome-detection of a supergene and other signals of selection.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Mats E Pettersson,Christina M Rochus,Fan Han,Junfeng Chen,Jason Hill,Ola Wallerman,Guangyi Fan,Xiaoning Hong,Qiwu Xu,He Zhang,Shanshan Liu,Xin Liu,Leanne Haggerty,Toby Hunt,Fergal J Martin,Paul Flicek,Ignas Bunikis,Arild Folkvord,Leif Andersson

    The Atlantic herring is a model species for exploring the genetic basis for ecological adaptation, due to its huge population size and extremely low genetic differentiation at selectively neutral loci. However, such studies have so far been hampered because of a highly fragmented genome assembly. Here, we deliver a chromosome-level genome assembly based on a hybrid approach combining a de novo Pacific Biosciences (PacBio) assembly with Hi-C-supported scaffolding. The assembly comprises 26 autosomes with sizes ranging from 12.4 to 33.1 Mb and a total size, in chromosomes, of 726 Mb, which has been corroborated by a high-resolution linkage map. A comparison between the herring genome assembly with other high-quality assemblies from bony fishes revealed few inter-chromosomal but frequent intra-chromosomal rearrangements. The improved assembly facilitates analysis of previously intractable large-scale structural variation, allowing, for example, the detection of a 7.8-Mb inversion on Chromosome 12 underlying ecological adaptation. This supergene shows strong genetic differentiation between populations. The chromosome-based assembly also markedly improves the interpretation of previously detected signals of selection, allowing us to reveal hundreds of independent loci associated with ecological adaptation.

    更新日期:2019-11-01
  • Identification and dynamic quantification of regulatory elements using total RNA.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Sascha H Duttke,Max W Chang,Sven Heinz,Christopher Benner

    The spatial and temporal regulation of transcription initiation is pivotal for controlling gene expression. Here, we introduce capped-small RNA-seq (csRNA-seq), which uses total RNA as starting material to detect transcription start sites (TSSs) of both stable and unstable RNAs at single-nucleotide resolution. csRNA-seq is highly sensitive to acute changes in transcription and identifies an order of magnitude more regulated transcripts than does RNA-seq. Interrogating tissues from species across the eukaryotic kingdoms identified unstable transcripts resembling enhancer RNAs, pri-miRNAs, antisense transcripts, and promoter upstream transcripts in multicellular animals, plants, and fungi spanning 1.6 billion years of evolution. Integration of epigenomic data from these organisms revealed that histone H3 trimethylation (H3K4me3) was largely confined to TSSs of stable transcripts, whereas H3K27ac marked nucleosomes downstream from all active TSSs, suggesting an ancient role for posttranslational histone modifications in transcription. Our findings show that total RNA is sufficient to identify transcribed regulatory elements and capture the dynamics of initiated stable and unstable transcripts at single-nucleotide resolution in eukaryotes.

    更新日期:2019-11-01
  • The subgenomes show asymmetric expression of alleles in hybrid lineages of Megalobrama amblycephala × Culter alburnus.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Li Ren,Wuhui Li,Qinbo Qin,He Dai,Fengming Han,Jun Xiao,Xin Gao,Jialin Cui,Chang Wu,Xiaojing Yan,Guoliang Wang,Guiming Liu,Jia Liu,Jiaming Li,Zhong Wan,Conghui Yang,Chun Zhang,Min Tao,Jing Wang,Kaikun Luo,Shi Wang,Fangzhou Hu,Rurong Zhao,Xuming Li,Min Liu,Hongkun Zheng,Rong Zhou,Yuqin Shu,Yude Wang,Qinfeng Liu,Chenchen Tang,Wei Duan,Shaojun Liu

    Hybridization drives rapid speciation by shaping novel genotypic and phenotypic profiles. Genomic incompatibility and transcriptome shock have been observed in hybrids, although this is rarer in animals than in plants. Using the newly sequenced genomes of the blunt snout bream (Megalobrama amblycephala [BSB]) and the topmouth culter (Culter alburnus [TC]), we focused on the sequence variation and gene expression changes in the reciprocal intergeneric hybrid lineages (F1-F3) of BSB × TC. A genome-wide transcriptional analysis identified 145-974 expressed recombinant genes in the successive generations of hybrid fish, suggesting the rapid emergence of allelic variation following hybridization. Some gradual changes of gene expression with additive and dominance effects and various cis and trans regulations were observed from F1 to F3 in the two hybrid lineages. These asymmetric patterns of gene expression represent the alternative strategies for counteracting deleterious effects of the subgenomes and improving adaptability of novel hybrids. Furthermore, we identified positive selection and additive expression patterns in transforming growth factor, beta 1b (tgfb1b), which may account for the morphological variations of the pharyngeal jaw in the two hybrid lineages. Our current findings provide insights into the evolution of vertebrate genomes immediately following hybridization.

    更新日期:2019-11-01
  • Genes essential for embryonic stem cells are associated with neurodevelopmental disorders.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Shahar Shohat,Sagiv Shifman

    Mouse embryonic stem cells (mESCs) are key components in generating mouse models for human diseases and performing basic research on pluripotency, yet the number of genes essential for mESCs is still unknown. We performed a genome-wide screen for essential genes in mESCs and compared it to screens in human cells. We found that essential genes are enriched for basic cellular functions, are highly expressed in mESCs, and tend to lack paralog genes. We discovered that genes that are essential specifically in mESCs play a role in pathways associated with their pluripotent state. We show that 29.5% of human genes intolerant to loss-of-function mutations are essential in mouse or human ESCs, and that the human phenotypes most significantly associated with genes essential for ESCs are neurodevelopmental. Our results provide insights into essential genes in the mouse, the pathways which govern pluripotency, and suggest that many genes associated with neurodevelopmental disorders are essential at very early embryonic stages.

    更新日期:2019-11-01
  • Cotargeting among microRNAs in the brain.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Jennifer M Cherone,Vjola Jorgji,Christopher B Burge

    MicroRNAs (miRNAs) play roles in diverse developmental and disease processes. Distinct miRNAs have hundreds to thousands of conserved mRNA binding sites but typically direct only modest repression via single sites. Cotargeting of individual mRNAs by different miRNAs could potentially achieve stronger and more complex patterns of repression. By comparing target sets of different miRNAs, we identified hundreds of pairs of miRNAs that share more mRNA targets than expected (often by twofold or more) relative to stringent controls. Genetic perturbations revealed a functional overlap in neuronal differentiation for the cotargeting pair miR-138/miR-137. Clustering of all cotargeting pairs revealed a group of nine predominantly brain-enriched miRNAs that share many targets. In reporter assays, subsets of these miRNAs together repressed gene expression by five- to 10-fold, often showing cooperative repression. Together, our results uncover an unexpected pattern in which combinations of miRNAs collaborate to robustly repress cotargets, and suggest important developmental roles for cotargeting.

    更新日期:2019-11-01
  • FFPEcap-seq: a method for sequencing capped RNAs in formalin-fixed paraffin-embedded samples.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Jeffery M Vahrenkamp,Kathryn Szczotka,Mark K Dodson,Elke A Jarboe,Andrew P Soisson,Jason Gertz

    The majority of clinical cancer specimens are preserved as formalin-fixed paraffin-embedded (FFPE) samples. For clinical molecular tests to have wide-reaching impact, they must be applicable to FFPE material. Accurate quantitative measurements of RNA derived from FFPE specimens is challenging because of low yields and high amounts of degradation. Here, we present FFPEcap-seq, a method specifically designed for sequencing capped 5' ends of RNA derived from FFPE samples. FFPEcap-seq combines enzymatic enrichment of 5' capped RNAs with template switching to create sequencing libraries. We find that FFPEcap-seq can faithfully capture mRNA expression levels in FFPE specimens while also detecting enhancer RNAs that arise from distal regulatory regions. FFPEcap-seq is a fast and straightforward method for making high-quality 5' end RNA-seq libraries from FFPE-derived RNA.

    更新日期:2019-11-01
  • Dynamics of microRNA expression during mouse prenatal development.
    Genome Res. (IF 9.944) Pub Date : 2019-10-28
    Sorena Rahmanian,Rabi Murad,Alessandra Breschi,Weihua Zeng,Mark Mackiewicz,Brian Williams,Carrie A Davis,Brian Roberts,Sarah Meadows,Dianna Moore,Diane Trout,Chris Zaleski,Alex Dobin,Lei-Hoon Sei,Jorg Drenkow,Alex Scavelli,Thomas R Gingeras,Barbara J Wold,Richard M Myers,Roderic Guigó,Ali Mortazavi

    MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.

    更新日期:2019-11-01
  • SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data.
    Genome Res. (IF 9.944) Pub Date : 2019-10-20
    Hamim Zafar,Nicholas Navin,Ken Chen,Luay Nakhleh

    Accumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS data sets, including false-positives (FP) and false-negatives (FN) due to allelic dropout, and cell doublets, significantly complicate these tasks. Here, we propose a nonparametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone, and the evolutionary relationship between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental data sets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.

    更新日期:2019-11-01
  • PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.
    Genome Res. (IF 9.944) Pub Date : 2019-10-20
    Salem Malikic,Farid Rashidi Mehrabadi,Simone Ciccolella,Md Khaledur Rahman,Camir Ricketts,Ehsan Haghshenas,Daniel Seidman,Faraz Hach,Iman Hajirasouliha,S Cenk Sahinalp

    Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and-as a first in tumor phylogeny reconstruction-a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.

    更新日期:2019-11-01
  • Quantitative mitochondrial DNA copy number determination using droplet digital PCR with single-cell resolution.
    Genome Res. (IF 9.944) Pub Date : 2019-09-25
    Ryan O'Hara,Enzo Tedone,Andrew Ludlow,Ejun Huang,Beatrice Arosio,Daniela Mari,Jerry W Shay

    Mitochondria are involved in a number of diverse cellular functions, including energy production, metabolic regulation, apoptosis, calcium homeostasis, cell proliferation, and motility, as well as free radical generation. Mitochondrial DNA (mtDNA) is present at hundreds to thousands of copies per cell in a tissue-specific manner. mtDNA copy number also varies during aging and disease progression and therefore might be considered as a biomarker that mirrors alterations within the human body. Here, we present a new quantitative, highly sensitive droplet digital PCR (ddPCR) method, droplet digital mitochondrial DNA measurement (ddMDM), to measure mtDNA copy number not only from cell populations but also from single cells. Our developed assay can generate data in as little as 3 h, is optimized for 96-well plates, and also allows the direct use of cell lysates without the need for DNA purification or nuclear reference genes. We show that ddMDM is able to detect differences between samples whose mtDNA copy number was close enough as to be indistinguishable by other commonly used mtDNA quantitation methods. By utilizing ddMDM, we show quantitative changes in mtDNA content per cell across a wide variety of physiological contexts including cancer progression, cell cycle progression, human T cell activation, and human aging.

    更新日期:2019-11-01
  • Nascent transcript analysis of glucocorticoid crosstalk with TNF defines primary and cooperative inflammatory repression.
    Genome Res. (IF 9.944) Pub Date : 2019-09-15
    Sarah K Sasse,Margaret Gruca,Mary A Allen,Vineela Kadiyala,Tengyao Song,Fabienne Gally,Arnav Gupta,Miles A Pufall,Robin D Dowell,Anthony N Gerber

    The glucocorticoid receptor (NR3C1, also known as GR) binds to specific DNA sequences and directly induces transcription of anti-inflammatory genes that contribute to cytokine repression, frequently in cooperation with NF-kB. Whether inflammatory repression also occurs through local interactions between GR and inflammatory gene regulatory elements has been controversial. Here, using global run-on sequencing (GRO-seq) in human airway epithelial cells, we show that glucocorticoid signaling represses transcription within 10 min. Many repressed regulatory regions reside within "hyper-ChIPable" genomic regions that are subject to dynamic, yet nonspecific, interactions with some antibodies. When this artifact was accounted for, we determined that transcriptional repression does not require local GR occupancy. Instead, widespread transcriptional induction through canonical GR binding sites is associated with reciprocal repression of distal TNF-regulated enhancers through a chromatin-dependent process, as evidenced by chromatin accessibility and motif displacement analysis. Simultaneously, transcriptional induction of key anti-inflammatory effectors is decoupled from primary repression through cooperation between GR and NF-kB at a subset of regulatory regions. Thus, glucocorticoids exert bimodal restraints on inflammation characterized by rapid primary transcriptional repression without local GR occupancy and secondary anti-inflammatory effects resulting from transcriptional cooperation between GR and NF-kB.

    更新日期:2019-11-01
  • Gene expression profiling of single cells from archival tissue with laser-capture microdissection and Smart-3SEQ.
    Genome Res. (IF 9.944) Pub Date : 2019-09-15
    Joseph W Foley,Chunfang Zhu,Philippe Jolivet,Shirley X Zhu,Peipei Lu,Michael J Meaney,Robert B West

    RNA sequencing (RNA-seq) is a sensitive and accurate method for quantifying gene expression. Small samples or those whose RNA is degraded, such as formalin-fixed paraffin-embedded (FFPE) tissue, remain challenging to study with nonspecialized RNA-seq protocols. Here, we present a new method, Smart-3SEQ, that accurately quantifies transcript abundance even with small amounts of total RNA and effectively characterizes small samples extracted by laser-capture microdissection (LCM) from FFPE tissue. We also obtain distinct biological profiles from FFPE single cells, which have been impossible to study with previous RNA-seq protocols, and we use these data to identify possible new macrophage phenotypes associated with the tumor microenvironment. We propose Smart-3SEQ as a highly cost-effective method to enable large gene expression profiling experiments unconstrained by sample size and tissue availability. In particular, Smart-3SEQ's compatibility with FFPE tissue unlocks an enormous number of archived clinical samples; combined with LCM it allows unprecedented studies of small cell populations and single cells isolated by their in situ context.

    更新日期:2019-11-01
  • Global analyses of the dynamics of mammalian microRNA metabolism.
    Genome Res. (IF 9.944) Pub Date : 2019-09-15
    Elena R Kingston,David P Bartel

    Rates of production and degradation together specify microRNA (miRNA) abundance and dynamics. Here, we used approach-to-steady-state metabolic labeling to assess these rates for 176 miRNAs in contact-inhibited mouse embryonic fibroblasts (MEFs), 182 miRNAs in dividing MEFs, and 127 miRNAs in mouse embryonic stem cells (mESCs). MicroRNA duplexes, each comprising a mature miRNA and its passenger strand, are produced at rates as fast as 110 ± 50 copies/cell/min, which exceeds rates reported for any mRNAs. These duplexes are rapidly loaded into Argonaute, with <30 min typically required for duplex loading and silencing-complex maturation. Within Argonaute, guide strands have stabilities that vary by 100-fold. Half-lives also vary globally between cell lines, with median values ranging from 11 to 34 h in mESCs and contact-inhibited MEFs, respectively. Moreover, relative half-lives for individual miRNAs vary between cell types, implying the influence of cell-specific factors in dictating turnover rate. The apparent influence of miRNA regions most important for targeting, together with the effect of one target on miR-7 accumulation, suggest that targets fulfill this role. Analysis of the tailing and trimming of miRNA 3' termini showed that the flux was typically greatest through the isoform tailed with a single uridine, although changes in this flux did not correspond to changes in stability, which suggested that the processes of tailing and trimming might be independent from that of decay. Together, these results establish a framework for describing the dynamics and regulation of miRNAs throughout their life cycle.

    更新日期:2019-11-01
  • A-to-I RNA editing contributes to the persistence of predicted damaging mutations in populations.
    Genome Res. (IF 9.944) Pub Date : 2019-09-14
    Te-Lun Mai,Trees-Juen Chuang

    Adenosine-to-inosine (A-to-I) RNA editing is a very common co-/posttranscriptional modification that can lead to A-to-G changes at the RNA level and compensate for G-to-A genomic changes to a certain extent. It has been shown that each healthy individual can carry dozens of missense variants predicted to be severely deleterious. Why strongly detrimental variants are preserved in a population and not eliminated by negative natural selection remains mostly unclear. Here, we ask if RNA editing correlates with the burden of deleterious A/G polymorphisms in a population. Integrating genome and transcriptome sequencing data from 447 human lymphoblastoid cell lines, we show that nonsynonymous editing activities (prevalence/level) are negatively correlated with the deleteriousness of A-to-G genomic changes and positively correlated with that of G-to-A genomic changes within the population. We find a significantly negative correlation between nonsynonymous editing activities and allele frequency of A within the population. This negative editing-allele frequency correlation is particularly strong when editing sites are located in highly important genes/loci. Examinations of deleterious missense variants from the 1000 Genomes Project further show a significantly higher proportion of rare missense mutations for G-to-A changes than for other types of changes. The proportion for G-to-A changes increases with increasing deleterious effects of the changes. Moreover, the deleteriousness of G-to-A changes is significantly positively correlated with the percentage of editing enzyme binding motifs at the variants. Overall, we show that nonsynonymous editing is associated with the increased burden of G-to-A missense mutations in healthy individuals, expanding RNA editing in pathogenomics studies.

    更新日期:2019-11-01
  • Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events.
    Genome Res. (IF 9.944) Pub Date : 2012-10-04
    Jinfeng Liu,William Lee,Zhaoshi Jiang,Zhongqiang Chen,Suchit Jhunjhunwala,Peter M Haverty,Florian Gnad,Yinghui Guan,Houston N Gilbert,Jeremy Stinson,Christiaan Klijn,Joseph Guillory,Deepali Bhatt,Steffan Vartanian,Kimberly Walter,Jocelyn Chan,Thomas Holcomb,Peter Dijkgraaf,Stephanie Johnson,Julie Koeman,John D Minna,Adi F Gazdar,Howard M Stern,Klaus P Hoeflich,Thomas D Wu,Jeff Settleman,Frederic J de Sauvage,Robert C Gentleman,Richard M Neve,David Stokoe,Zora Modrusan,Somasekar Seshagiri,David S Shames,Zemin Zhang

    Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.

    更新日期:2019-11-01
  • Genes in a refined Smith-Magenis syndrome critical deletion interval on chromosome 17p11.2 and the syntenic region of the mouse.
    Genome Res. (IF 9.944) Pub Date : 2002-05-09
    Weimin Bi,Jiong Yan,Pawe Stankiewicz,Sung-Sup Park,Katherina Walz,Cornelius F Boerkoel,Lorraine Potocki,Lisa G Shaffer,Koen Devriendt,Magorzata J M Nowaczyk,Ken Inoue,James R Lupski

    Smith-Magenis syndrome (SMS) is a multiple congenital anomaly/mental retardation syndrome associated with behavioral abnormalities and sleep disturbance. Most patients have the same approximately 4 Mb interstitial genomic deletion within chromosome 17p11.2. To investigate the molecular bases of the SMS phenotype, we constructed BAC/PAC contigs covering the SMS common deletion interval and its syntenic region on mouse chromosome 11. Comparative genome analysis reveals the absence of all three approximately 200-kb SMS-REP low-copy repeats in the mouse and indicates that the evolution of SMS-REPs was accompanied by transposition of adjacent genes. Physical and genetic map comparisons in humans reveal reduced recombination in both sexes. Moreover, by examining the deleted regions in SMS patients with unusual-sized deletions, we refined the minimal Smith-Magenis critical region (SMCR) to an approximately 1.1-Mb genomic interval that is syntenic to an approxiamtely 1.0-Mb region in the mouse. Genes within the SMCR and its mouse syntenic region were identified by homology searches and by gene prediction programs, and their gene structures and expression profiles were characterized. In addition to 12 genes previously mapped, we identified 8 new genes and 10 predicted genes in the SMCR. In the mouse syntenic region of the human SMCR, 16 genes and 6 predicted genes were identified. The SMCR is highly conserved between humans and mice, including 19 genes with the same gene order and orientation. Our findings will facilitate both the identification of gene(s) responsible for the SMS phenotype and the engineering of an SMS mouse model.

    更新日期:2019-11-01
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
加州大学洛杉矶分校
上海纽约大学William Glover
南开大学化学院周其林
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug