显示样式:     当前期刊: Genome Research    加入关注    导出
我的关注
我的收藏
您暂时未登录!
登录
  • Discovery of non-canonical translation initiation sites through mass spectrometric analysis of protein N-termini
    Genome Res. (IF 11.922) Pub Date : 2017-11-21
    Chan Hyun Na; Mustafa Barbhuiya; Min-Sik Kim; Steven Verbruggen; Stephen Eacker; Olga Pletnikova; Juan Troncoso; Marc Halushka; Gerben Menschaert; Christopher Overall; Akhilesh Pandey

    Translation initiation generally occurs at AUG codons in eukaryotes although it has been shown that non-AUG or non-canonical translation initiation can also occur. However, the evidence for non-canonical translation initiation sites (TISs) is largely indirect and based on ribosome profiling studies. Here, using a strategy specifically designed to enrich N-termini of proteins, we demonstrate that many human proteins are translated at non-canonical TISs. The large majority of TISs that mapped to 5' untranslated regions were non-canonical and led to N-terminal extension of annotated proteins or translation of upstream small open reading frames (uORF). It has been controversial whether the amino acid corresponding to the start codon is incorporated at TIS or methionine is still incorporated. We found that methionine was incorporated at almost all non-canonical TISs identified in this study. Comparison of the TISs determined through mass spectrometry with ribosome profiling data revealed that about two-thirds of the novel annotations were indeed supported by the available ribosome profiling data. Sequence conservation across species and a higher abundance of non-canonical TISs than canonical ones in some cases suggests that the non-canonical TISs can have biological functions. Overall, this study provides evidence of protein translation initiation at non-canonical TISs and argues that further studies are required for elucidation of functional implications of such non-canonical translation initiation.

    更新日期:2017-11-22
  • ABCA4 midigenes reveal the full splice spectrum of all reported non-canonical splice site variants in Stargardt disease
    Genome Res. (IF 11.922) Pub Date : 2017-11-21
    Riccardo Sangermano; Mubeen Khan; Stéphanie S. Cornelis; Valerie Richelle; Silvia Albert; Duaa Elmelik; Alejandro Garanto; Raheel Qamar; Dorien Lugtenberg; L Ingeborgh van den Born; Rob W.J. Collin; Frans P.M. Cremers

    Stargardt disease is caused by variants in the ABCA4 gene, a significant part of which are non-canonical splice site (NCSS) variants. In case a gene of interest is not expressed in available somatic cells, small genomic fragments carrying potential disease-associated variants are tested for splice abnormalities using in vitro splice assays. We recently discovered that when using small minigenes lacking the proper genomic context, in vitro results do not correlate with splice defects observed in patient cells. We therefore devised a novel strategy in which a bacterial artificial chromosome was employed to generate midigenes, splice vectors of varying lengths (up to 11.7 kb) covering almost the entire ABCA4 gene. These midigenes were used to analyze the effect of all 44 reported and 3 novel NCSS variants on ABCA4 pre-mRNA splicing. Intriguingly, multi-exon skipping events were observed, as well as exon elongation and intron retention. The analysis of all reported NCSS variants in ABCA4 allowed us to reveal the nature of aberrant splicing events and to classify the severity of these mutations based on the residual fraction of wild-type mRNA. Our strategy to generate large overlapping splice vectors carrying multiple exons, creating a toolbox for robust and high-throughput analysis of splice variants, can be applied to all human genes.

    更新日期:2017-11-22
  • Cre-dependent Cas9-expressing pigs enable efficient in vivo genome editing
    Genome Res. (IF 11.922) Pub Date : 2017-11-16
    Kepin Wang; Qin Jin; Degong Ruan; Yi Yang; Qishuai Liu; Han Wu; Zhiwei Zhou; Zhen Ouyang; Zhaoming Liu; Yu Zhao; Bentian Zhao; Quanjun Zhang; Jiangyun Peng; Chengdan Lai; Nana Fan; Yanhui Liang; Ting Lan; Nan Li; Xiaoshan Wang; Xinlu Wang; Yong Fan; Pieter A. Doevendans; Joost P.G. Sluijter; Pentao Liu; Xiaoping Li; Liangxue Lai

    Despite being time-consuming and costly, generating genome-edited pigs holds great promise for agricultural, biomedical, and pharmaceutical applications. To further facilitate genome editing in pigs, we report here establishment of a pig line with Cre-inducible Cas9 expression that allows a variety of ex vivo genome editing in fibroblast cells including single- and multigene modifications, chromosome rearrangements, and efficient in vivo genetic modifications. As a proof of principle, we were able to simultaneously inactivate five tumor suppressor genes (TP53, PTEN, APC, BRCA1, and BRCA2) and activate one oncogene (KRAS), achieved by delivering Cre recombinase and sgRNAs, which caused rapid lung tumor development. The efficient genome editing shown here demonstrates that these pigs can serve as a powerful tool for dissecting in vivo gene functions and biological processes in a temporal manner and for streamlining the production of genome-edited pigs for disease modeling.

    更新日期:2017-11-16
  • Quantitative RNA-seq meta analysis of alternative exon usage in C. elegans.
    Genome Res. (IF 11.922) Pub Date : 2017-10-31
    Nicolas J Tourasse; Jonathan R. M. Millet; Denis Dupuy

    Almost 20 years after the completion of the C. elegans genome sequence, gene structure annotation is still an ongoing process with new evidence for gene variants still being regularly uncovered by additional in-depth transcriptome studies. While alternative splice forms can allow a single gene to encode several functional isoforms, the question of how much spurious splicing is tolerated is still heavily debated. Here we gathered a compendium of 1682 publicly available C. elegans RNA-seq data sets to increase the dynamic range of detection of RNA isoforms, and obtained robust measurements of the relative abundance of each splicing event. While most of the splicing reads come from reproducibly detected splicing events, a large fraction of purported junctions is only supported by a very low number of reads. We devised an automated curation method that takes into account the expression level of each gene to discriminate robust splicing events from potential biological noise. We found that rarely used splice sites disproportionately come from highly expressed genes and are significantly less conserved in other nematode genomes than splice sites with a higher usage frequency. Our increased detection power confirmed trans-splicing for at least 84% of C. elegans protein coding genes. The genes for which trans-splicing was not observed are overwhelmingly low expression genes, suggesting that the mechanism is pervasive but not fully captured by organism-wide RNA-seq. We generated annotated gene models including quantitative exon usage information for the entire C. elegans genome. This allows users to visualize at a glance the relative expression of each isoform for their gene of interest.

    更新日期:2017-11-16
  • An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics
    Genome Res. (IF 11.922) Pub Date : 2017-11-15
    Ulrich Omasits; Adithi R. Varadarajan; Michael Schmid; Sandra Goetze; Damianos Melidis; Marc Bourqui; Olga Nikolayeva; Maxime Québatte; Andrea Patrignani; Christoph Dehio; Juerg E. Frey; Mark D. Robinson; Bernd Wollscheid; Christian H. Ahrens

    Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.

    更新日期:2017-11-16
  • Rapid molecular assays to study human centromere genomics
    Genome Res. (IF 11.922) Pub Date : 2017-11-15
    Rafael Contreras-Galindo; Sabrina Fischer; Anjan K. Saha; John D. Lundy; Patrick W. Cervantes; Mohamad Mourad; Claire Wang; Brian Qian; Manhong Dai; Fan Meng; Arul Chinnaiyan; Gilbert S. Omenn; Mark H. Kaplan; David M. Markovitz

    The centromere is the structural unit responsible for the faithful segregation of chromosomes. Although regulation of centromeric function by epigenetic factors has been well-studied, the contributions of the underlying DNA sequences have been much less well defined, and existing methodologies for studying centromere genomics in biology are laborious. We have identified specific markers in the centromere of 23 of the 24 human chromosomes that allow for rapid PCR assays capable of capturing the genomic landscape of human centromeres at a given time. Use of this genetic strategy can also delineate which specific centromere arrays in each chromosome drive the recruitment of epigenetic modulators. We further show that, surprisingly, loss and rearrangement of DNA in centromere 21 is associated with trisomy 21. This new approach can thus be used to rapidly take a snapshot of the genetics and epigenetics of each specific human centromere in nondisjunction disorders and other biological settings.

    更新日期:2017-11-16
  • Chromatin accessibility dynamics reveal novel functional enhancers in C. elegans
    Genome Res. (IF 11.922) Pub Date : 2017-11-15
    Aaron C. Daugherty; Robin W. Yeo; Jason D. Buenrostro; William J. Greenleaf; Anshul Kundaje; Anne Brunet

    Chromatin accessibility, a crucial component of genome regulation, has primarily been studied in homogeneous and simple systems, such as isolated cell populations or early-development models. Whether chromatin accessibility can be assessed in complex, dynamic systems in vivo with high sensitivity remains largely unexplored. In this study, we use ATAC-seq to identify chromatin accessibility changes in a whole animal, the model organism Caenorhabditis elegans, from embryogenesis to adulthood. Chromatin accessibility changes between developmental stages are highly reproducible, recapitulate histone modification changes, and reveal key regulatory aspects of the epigenomic landscape throughout organismal development. We find that over 5000 distal noncoding regions exhibit dynamic changes in chromatin accessibility between developmental stages and could thereby represent putative enhancers. When tested in vivo, several of these putative enhancers indeed drive novel cell-type- and temporal-specific patterns of expression. Finally, by integrating transcription factor binding motifs in a machine learning framework, we identify EOR-1 as a unique transcription factor that may regulate chromatin dynamics during development. Our study provides a unique resource for C. elegans, a system in which the prevalence and importance of enhancers remains poorly characterized, and demonstrates the power of using whole organism chromatin accessibility to identify novel regulatory regions in complex systems.

    更新日期:2017-11-16
  • Nanopore sequencing of complex genomic rearrangements in yeast reveals mechanisms of repeat-mediated double-strand break repair
    Genome Res. (IF 11.922) Pub Date : 2017-11-07
    Ryan J. McGinty; Rachel G. Rubinstein; Alexander J. Neil; Margaret Dominska; Denis Kiktev; Thomas D. Petes; Sergei M. Mirkin

    Improper DNA double-strand break (DSB) repair results in complex genomic rearrangements (CGRs) in many cancers and various congenital disorders in humans. Trinucleotide repeat sequences, such as (GAA)n repeats in Friedreich's ataxia, (CTG)n repeats in myotonic dystrophy, and (CGG)n repeats in fragile X syndrome, are also subject to double-strand breaks within the repetitive tract followed by DNA repair. Mapping the outcomes of CGRs is important for understanding their causes and potential phenotypic effects. However, high-resolution mapping of CGRs has traditionally been a laborious and highly skilled process. Recent advances in long-read DNA sequencing technologies, specifically Nanopore sequencing, have made possible the rapid identification of CGRs with single base pair resolution. Here, we have used whole-genome Nanopore sequencing to characterize several CGRs that originated from naturally occurring DSBs at (GAA)n microsatellites in Saccharomyces cerevisiae. These data gave us important insights into the mechanisms of DSB repair leading to CGRs.

    更新日期:2017-11-16
  • A novel approach for data integration and disease subtyping
    Genome Res. (IF 11.922) Pub Date : 2017-10-24
    Tin Nguyen; Rebecca Tagett; Diana Diaz; Sorin Draghici

    Advances in high-throughput technologies allow for measurements of many types of omics data, yet the meaningful integration of several different data types remains a significant challenge. Another important and difficult problem is the discovery of molecular disease subtypes characterized by relevant clinical differences, such as survival. Here we present a novel approach, called perturbation clustering for data integration and disease subtyping (PINS), which is able to address both challenges. The framework has been validated on thousands of cancer samples, using gene expression, DNA methylation, noncoding microRNA, and copy number variation data available from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. This simultaneous subtyping approach accurately identifies known cancer subtypes and novel subgroups of patients with significantly different survival profiles. The results were obtained from genome-scale molecular data without any other type of prior knowledge. The approach is sufficiently general to replace existing unsupervised clustering approaches outside the scope of bio-medical research, with the additional ability to integrate multiple types of data.

    更新日期:2017-11-15
  • Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations
    Genome Res. (IF 11.922) Pub Date : 2017-10-27
    Zoe June Assaf; Jane Park; Susanne Tilk; Mark L Siegal; Dmitri Petrov

    Mutations provide the raw material of evolution, and thus our ability to study evolution depends fundamentally on having precise measurements of mutational rates and patterns. We generate a data set for this purpose using (1) de novo mutations from mutation accumulation experiments and (2) extremely rare polymorphisms from natural populations. The first, mutation accumulation (MA) lines are the product of maintaining flies in tiny populations for many generations, therefore rendering natural selection ineffective and allowing new mutations to accrue in the genome. The second, rare genetic variation from natural populations allows the study of mutation because extremely rare polymorphisms are relatively unaffected by the filter of natural selection. We use both methods in Drosophila melanogaster, first generating our own novel data set of sequenced MA lines and performing a meta-analysis of all published MA mutations (∼2000 events) and then identifying a high quality set of ∼70,000 extremely rare (≤0.1%) polymorphisms that are fully validated with resequencing. We use these data sets to precisely measure mutational rates and patterns. Highlights of our results include: a high rate of multinucleotide mutation events at both short (∼5 bp) and long (∼1 kb) genomic distances, showing that mutation drives GC content lower in already GC-poor regions, and using our precise context-dependent mutation rates to predict long-term evolutionary patterns at synonymous sites. We also show that de novo mutations from independent MA experiments display similar patterns of single nucleotide mutation and well match the patterns of mutation found in natural populations.

    更新日期:2017-11-15
  • Convergent origination of a Drosophila-like dosage compensation mechanism in a reptile lineage
    Genome Res. (IF 11.922) Pub Date : 2017-11-13
    Ray Marin; Diego Cortez; Francesco Lamanna; Madapura M. Pradeepa; Evgeny Leushkin; Philippe Julien; Angélica Liechti; Jean Halbert; Thoomke Brüning; Katharina Mössinger; Timo Trefzer; Christian Conrad; Halie N. Kerver; Juli Wade; Patrick Tschopp; Henrik Kaessmann

    Sex chromosomes differentiated from different ancestral autosomes in various vertebrate lineages. Here, we trace the functional evolution of the XY Chromosomes of the green anole lizard (Anolis carolinensis), on the basis of extensive high-throughput genome, transcriptome and histone modification sequencing data and revisit dosage compensation evolution in representative mammals and birds with substantial new expression data. Our analyses show that Anolis sex chromosomes represent an ancient XY system that originated at least ≈160 million years ago in the ancestor of Iguania lizards, shortly after the separation from the snake lineage. The age of this system approximately coincides with the ages of the avian and two mammalian sex chromosomes systems. To compensate for the almost complete Y Chromosome degeneration, X-linked genes have become twofold up-regulated, restoring ancestral expression levels. The highly efficient dosage compensation mechanism of Anolis represents the only vertebrate case identified so far to fully support Ohno's original dosage compensation hypothesis. Further analyses reveal that X up-regulation occurs only in males and is mediated by a male-specific chromatin machinery that leads to global hyperacetylation of histone H4 at lysine 16 specifically on the X Chromosome. The green anole dosage compensation mechanism is highly reminiscent of that of the fruit fly, Drosophila melanogaster. Altogether, our work unveils the convergent emergence of a Drosophila-like dosage compensation mechanism in an ancient reptilian sex chromosome system and highlights that the evolutionary pressures imposed by sex chromosome dosage reductions in different amniotes were resolved in fundamentally different ways.

    更新日期:2017-11-13
  • Genome-wide discovery of active regulatory elements and transcription factor footprints in Caenorhabditis elegans using DNase-seq
    Genome Res. (IF 11.922) Pub Date : 2017-10-26
    Paul Sternberg; Margaret Ho; Porfirio Quintero-Cadena

    Deep sequencing of size-selected DNase I–treated chromatin (DNase-seq) allows high-resolution measurement of chromatin accessibility to DNase I cleavage, permitting identification of de novo active cis-regulatory modules (CRMs) and individual transcription factor (TF) binding sites. We adapted DNase-seq to nuclei isolated from C. elegans embryos and L1 arrest larvae to generate high-resolution maps of TF binding. Over half of embryonic DNase I hypersensitive sites (DHSs) were annotated as noncoding, with 24% in intergenic, 12% in promoters, and 28% in introns, with similar statistics observed in L1 arrest larvae. Noncoding DHSs are highly conserved and enriched in marks of enhancer activity and transcription. We validated noncoding DHSs against known enhancers from myo-2, myo-3, hlh-1, elt-2, and lin-26/lir-1 and recapitulated 15 of 17 known enhancers. We then mined DNase-seq data to identify putative active CRMs and TF footprints. Using DNase-seq data improved predictions of tissue-specific expression compared with motifs alone. In a pilot functional test, 10 of 15 DHSs from pha-4, icl-1, and ceh-13 drove reporter gene expression in transgenic C. elegans. Overall, we provide experimental annotation of 26,644 putative CRMs in the embryo containing 55,890 TF footprints, as well as 15,841 putative CRMs in the L1 arrest larvae containing 32,685 TF footprints.

    更新日期:2017-11-10
  • Comparative genome analysis of programmed DNA elimination in nematodes
    Genome Res. (IF 11.922) Pub Date : 2017-11-08
    Jianbin Wang; Shenghan Gao; Yulia Mostovoy; Yuanyuan Kang; Maxim Zagoskin; Yongqiao Sun; Bing Zhang; Laura K. White; Alice Easton; Thomas B. Nutman; Pui-Yan Kwok; Songnian Hu; Martin K. Nielsen; Richard E. Davis

    Programmed DNA elimination is a developmentally regulated process leading to the reproducible loss of specific genomic sequences. DNA elimination occurs in unicellular ciliates and a variety of metazoans, including invertebrates and vertebrates. In metazoa, DNA elimination typically occurs in somatic cells during early development, leaving the germline genome intact. Reference genomes for metazoa that undergo DNA elimination are not available. Here, we generated germline and somatic reference genome sequences of the DNA eliminating pig parasitic nematode Ascaris suum and the horse parasite Parascaris univalens. In addition, we carried out in-depth analyses of DNA elimination in the parasitic nematode of humans, Ascaris lumbricoides, and the parasitic nematode of dogs, Toxocara canis. Our analysis of nematode DNA elimination reveals that in all species, repetitive sequences (that differ among the genera) and germline-expressed genes (approximately 1000–2000 or 5%–10% of the genes) are eliminated. Thirty-five percent of these eliminated genes are conserved among these nematodes, defining a core set of eliminated genes that are preferentially expressed during spermatogenesis. Our analysis supports the view that DNA elimination in nematodes silences germline-expressed genes. Over half of the chromosome break sites are conserved between Ascaris and Parascaris, whereas only 10% are conserved in the more divergent T. canis. Analysis of the chromosomal breakage regions suggests a sequence-independent mechanism for DNA breakage followed by telomere healing, with the formation of more accessible chromatin in the break regions prior to DNA elimination. Our genome assemblies and annotations also provide comprehensive resources for analysis of DNA elimination, parasitology research, and comparative nematode genome and epigenome studies.

    更新日期:2017-11-08
  • GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly
    Genome Res. (IF 11.922) Pub Date : 2017-11-02
    Daniel L. Cameron; Jan Schröder; Jocelyn Sietsma Penington; Hongdo Do; Ramyar Molania; Alexander Dobrovic; Terence P. Speed; Anthony T. Papenfuss

    The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.

    更新日期:2017-11-02
  • Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences
    Genome Res. (IF 11.922) Pub Date : 2017-11-02
    Josh T. Cuperus; Benjamin Groves; Anna Kuchina; Alexander B. Rosenberg; Nebojsa Jojic; Stanley Fields; Georg Seelig

    Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.

    更新日期:2017-11-02
  • Single-cell gene expression analysis reveals regulators of distinct cell subpopulations among developing human neurons
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Jiaxu Wang; Piroon Jenjaroenpun; Akshay Bhinge; Vladimir Espinosa Angarica; Antonio Del Sol; Intawat Nookaew; Vladimir A. Kuznetsov; Lawrence W. Stanton

    The stochastic dynamics and regulatory mechanisms that govern differentiation of individual human neural precursor cells (NPC) into mature neurons are currently not fully understood. Here, we used single-cell RNA-sequencing (scRNA-seq) of developing neurons to dissect/identify NPC subtypes and critical developmental stages of alternative lineage specifications. This study comprises an unsupervised, high-resolution strategy for identifying cell developmental bifurcations, tracking the stochastic transcript kinetics of the subpopulations, elucidating regulatory networks, and finding key regulators. Our data revealed the bifurcation and developmental tracks of the two NPC subpopulations, and we captured an early (24 h) transition phase that leads to alternative neuronal specifications. The consequent up-regulation and down-regulation of stage- and subpopulation-specific gene groups during the course of maturation revealed biological insights with regard to key regulatory transcription factors and lincRNAs that control cellular programs in the identified neuronal subpopulations.

    更新日期:2017-11-01
  • Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Aaron T.L. Lun; Fernando J. Calero-Nieto; Liora Haim-Vilmovsky; Berthold Göttgens; John C. Marioni

    By profiling the transcriptomes of individual cells, single-cell RNA sequencing provides unparalleled resolution to study cellular heterogeneity. However, this comes at the cost of high technical noise, including cell-specific biases in capture efficiency and library generation. One strategy for removing these biases is to add a constant amount of spike-in RNA to each cell and to scale the observed expression values so that the coverage of spike-in transcripts is constant across cells. This approach has previously been criticized as its accuracy depends on the precise addition of spike-in RNA to each sample. Here, we perform mixture experiments using two different sets of spike-in RNA to quantify the variance in the amount of spike-in RNA added to each well in a plate-based protocol. We also obtain an upper bound on the variance due to differences in behavior between the two spike-in sets. We demonstrate that both factors are small contributors to the total technical variance and have only minor effects on downstream analyses, such as detection of highly variable genes and clustering. Our results suggest that scaling normalization using spike-in transcripts is reliable enough for routine use in single-cell RNA sequencing data analyses.

    更新日期:2017-11-01
  • Disease-specific biases in alternative splicing and tissue-specific dysregulation revealed by multitissue profiling of lymphocyte gene expression in type 1 diabetes
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Jeremy R.B. Newman; Ana Conesa; Matthew Mika; Felicia N. New; Suna Onengut-Gumuscu; Mark A. Atkinson; Stephen S. Rich; Lauren M. McIntyre; Patrick Concannon

    Genome-wide association studies (GWAS) have identified multiple, shared allelic associations with many autoimmune diseases. However, the pathogenic contributions of variants residing in risk loci remain unresolved. The location of the majority of shared disease-associated variants in noncoding regions suggests they contribute to risk of autoimmunity through effects on gene expression in the immune system. In the current study, we test this hypothesis by applying RNA sequencing to CD4+, CD8+, and CD19+ lymphocyte populations isolated from 81 subjects with type 1 diabetes (T1D). We characterize and compare the expression patterns across these cell types for three gene sets: all genes, the set of genes implicated in autoimmune disease risk by GWAS, and the subset of these genes specifically implicated in T1D. We performed RNA sequencing and aligned the reads to both the human reference genome and a catalog of all possible splicing events developed from the genome, thereby providing a comprehensive evaluation of the roles of gene expression and alternative splicing (AS) in autoimmunity. Autoimmune candidate genes displayed greater expression specificity in the three lymphocyte populations relative to other genes, with significantly increased levels of splicing events, particularly those predicted to have substantial effects on protein isoform structure and function (e.g., intron retention, exon skipping). The majority of single-nucleotide polymorphisms within T1D-associated loci were also associated with one or more cis-expression quantitative trait loci (cis-eQTLs) and/or splicing eQTLs. Our findings highlight a substantial, and previously underrecognized, role for AS in the pathogenesis of autoimmune disorders and particularly for T1D.

    更新日期:2017-11-01
  • Nascent RNA sequencing reveals a dynamic global transcriptional response at genes and enhancers to the natural medicinal compound celastrol
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Noah Dukler; Gregory T. Booth; Yi-Fei Huang; Nathaniel Tippens; Colin T. Waters; Charles G. Danko; John T. Lis; Adam Siepel

    Most studies of responses to transcriptional stimuli measure changes in cellular mRNA concentrations. By sequencing nascent RNA instead, it is possible to detect changes in transcription in minutes rather than hours and thereby distinguish primary from secondary responses to regulatory signals. Here, we describe the use of PRO-seq to characterize the immediate transcriptional response in human cells to celastrol, a compound derived from traditional Chinese medicine that has potent anti-inflammatory, tumor-inhibitory, and obesity-controlling effects. Celastrol is known to elicit a cellular stress response resembling the response to heat shock, but the transcriptional basis of this response remains unclear. Our analysis of PRO-seq data for K562 cells reveals dramatic transcriptional effects soon after celastrol treatment at a broad collection of both coding and noncoding transcription units. This transcriptional response occurred in two major waves, one within 10 min, and a second 40–60 min after treatment. Transcriptional activity was generally repressed by celastrol, but one distinct group of genes, enriched for roles in the heat shock response, displayed strong activation. Using a regression approach, we identified key transcription factors that appear to drive these transcriptional responses, including members of the E2F and RFX families. We also found sequence-based evidence that particular transcription factors drive the activation of enhancers. We observed increased polymerase pausing at both genes and enhancers, suggesting that pause release may be widely inhibited during the celastrol response. Our study demonstrates that a careful analysis of PRO-seq time-course data can disentangle key aspects of a complex transcriptional response, and it provides new insights into the activity of a powerful pharmacological agent.

    更新日期:2017-11-01
  • Altered hydroxymethylation is seen at regulatory regions in pancreatic cancer and regulates oncogenic pathways
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Sanchari Bhattacharyya; Kith Pradhan; Nathaniel Campbell; Jozef Mazdo; Aparna Vasantkumar; Shahina Maqbool; Tushar D. Bhagat; Sonal Gupta; Masako Suzuki; Yiting Yu; John M. Greally; Ulrich Steidl; James Bradner; Meelad Dawlaty; Lucy Godley; Anirban Maitra; Amit Verma

    Transcriptional deregulation of oncogenic pathways is a hallmark of cancer and can be due to epigenetic alterations. 5-Hydroxymethylcytosine (5-hmC) is an epigenetic modification that has not been studied in pancreatic cancer. Genome-wide analysis of 5-hmC-enriched loci with hmC-seal was conducted in a cohort of low-passage pancreatic cancer cell lines, primary patient-derived xenografts, and pancreatic controls and revealed strikingly altered patterns in neoplastic tissues. Differentially hydroxymethylated regions preferentially affected known regulatory regions of the genome, specifically overlapping with known H3K4me1 enhancers. Furthermore, base pair resolution analysis of cytosine methylation and hydroxymethylation with oxidative bisulfite sequencing was conducted and correlated with chromatin accessibility by ATAC-seq and gene expression by RNA-seq in pancreatic cancer and control samples. 5-hmC was specifically enriched at open regions of chromatin, and gain of 5-hmC was correlated with up-regulation of the cognate transcripts, including many oncogenic pathways implicated in pancreatic neoplasia, such as MYC, KRAS, VEGFA, and BRD4. Specifically, BRD4 was overexpressed and acquired 5-hmC at enhancer regions in the majority of neoplastic samples. Functionally, acquisition of 5-hmC at BRD4 promoter was associated with increase in transcript expression in reporter assays and primary samples. Furthermore, blockade of BRD4 inhibited pancreatic cancer growth in vivo. In summary, redistribution of 5-hmC and preferential enrichment at oncogenic enhancers is a novel regulatory mechanism in human pancreatic cancer.

    更新日期:2017-11-01
  • Co-expression networks reveal the tissue-specific regulation of transcription and splicing
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Ashis Saha; Yungil Kim; Ariel D.H. Gewirtz; Brian Jo; Chuan Gao; Ian C. McDowell; The GTEx Consortium; Barbara E. Engelhardt; Alexis Battle; François Aguet; Kristin G. Ardlie; Beryl B. Cummings; Ellen T. Gelfand; Gad Getz; Kane Hadley; Robert E. Handsaker; Katherine H. Huang; Seva Kashin; Konrad J. Karczewski; Monkol Lek; Xiao Li; Daniel G. MacArthur; Jared L. Nedzel; Duyen T. Nguyen; Michael S. Noble; Ayellet V. Segrè; Casandra A. Trowbridge; Taru Tukiainen; Nathan S. Abell; Brunilda Balliu; Ruth Barshir; Omer Basha; Alexis Battle; Gireesh K. Bogu; Andrew Brown; Christopher D. Brown; Stephane E. Castel; Lin S. Chen; Colby Chiang; Donald F. Conrad; Nancy J. Cox; Farhan N. Damani; Joe R. Davis; Olivier Delaneau; Emmanouil T. Dermitzakis; Barbara E. Engelhardt; Eleazar Eskin; Pedro G. Ferreira; Laure Frésard; Eric R. Gamazon; Diego Garrido-Martín; Ariel D.H. Gewirtz; Genna Gliner; Michael J. Gloudemans; Roderic Guigo; Ira M. Hall; Buhm Han; Yuan He; Farhad Hormozdiari; Cedric Howald; Hae Kyung Im; Brian Jo; Eun Yong Kang; Yungil Kim; Sarah Kim-Hellmuth; Tuuli Lappalainen; Gen Li; Xin Li; Boxiang Liu; Serghei Mangul; Mark I. McCarthy; Ian C. McDowell; Pejman Mohammadi; Jean Monlong; Stephen B. Montgomery; Manuel Muñoz-Aguirre; Anne W. Ndungu; Dan L. Nicolae; Andrew B. Nobel; Meritxell Oliva; Halit Ongen; John J. Palowitch; Nikolaos Panousis; Panagiotis Papasaikas; YoSon Park; Princy Parsana; Anthony J. Payne; Christine B. Peterson; Jie Quan; Ferran Reverter; Chiara Sabatti; Ashis Saha; Michael Sammeth; Alexandra J. Scott; Andrey A. Shabalin; Reza Sodaei; Matthew Stephens; Barbara E. Stranger; Benjamin J. Strober; Jae Hoon Sul; Emily K. Tsang; Sarah Urbut; Martijn van de Bunt; Gao Wang; Xiaoquan Wen; Fred A. Wright; Hualin S. Xi; Esti Yeger-Lotem; Zachary Zappala; Judith B. Zaugg; Yi-Hui Zhou; Joshua M. Akey; Daniel Bates; Joanne Chan; Lin S. Chen; Melina Claussnitzer; Kathryn Demanelis; Morgan Diegel; Jennifer A. Doherty; Andrew P. Feinberg; Marian S. Fernando; Jessica Halow; Kasper D. Hansen; Eric Haugen; Peter F. Hickey; Lei Hou; Farzana Jasmine; Ruiqi Jian; Lihua Jiang; Audra Johnson; Rajinder Kaul; Manolis Kellis; Muhammad G. Kibriya; Kristen Lee; Jin Billy Li; Qin Li; Xiao Li; Jessica Lin; Shin Lin; Sandra Linder; Caroline Linke; Yaping Liu; Matthew T. Maurano; Benoit Molinie; Stephen B. Montgomery; Jemma Nelson; Fidencio J. Neri; Meritxell Oliva; Yongjin Park; Brandon L. Pierce; Nicola J. Rinaldi; Lindsay F. Rizzardi; Richard Sandstrom; Andrew Skol; Kevin S. Smith; Michael P. Snyder; John Stamatoyannopoulos; Barbara E. Stranger; Hua Tang; Emily K. Tsang; Li Wang; Meng Wang; Nicholas Van Wittenberghe; Fan Wu; Rui Zhang; Concepcion R. Nierras; Philip A. Branton; Latarsha J. Carithers; Ping Guan; Helen M. Moore; Abhi Rao; Jimmie B. Vaught; Sarah E. Gould; Nicole C. Lockart; Casey Martin; Jeffery P. Struewing; Simona Volpi; Anjene M. Addington; Susan E. Koester; A. Roger Little; Lori E. Brigham; Richard Hasz; Marcus Hunter; Christopher Johns; Mark Johnson; Gene Kopen; William F. Leinweber; John T. Lonsdale; Alisa McDonald; Bernadette Mestichelli; Kevin Myer; Brian Roe; Michael Salvatore; Saboor Shad; Jeffrey A. Thomas; Gary Walters; Michael Washington; Joseph Wheeler; Jason Bridge; Barbara A. Foster; Bryan M. Gillard; Ellen Karasik; Rachna Kumar; Mark Miklos; Michael T. Moser; Scott D. Jewell; Robert G. Montroy; Daniel C. Rohrer; Dana R. Valley; David A. Davis; Deborah C. Mash; Anita H. Undale; Anna M. Smith; David E. Tabor; Nancy V. Roche; Jeffrey A. McLean; Negin Vatanian; Karna L. Robinson; Leslie Sobin; Mary E. Barcus; Kimberly M. Valentino; Liqun Qi; Steven Hunter; Pushpa Hariharan; Shilpi Singh; Ki Sung Um; Takunda Matose; Maria M. Tomaszewski; Laura K. Barker; Maghboeba Mosavel; Laura A. Siminoff; Heather M. Traino; Paul Flicek; Thomas Juettemann; Magali Ruffier; Dan Sheppard; Kieron Taylor; Stephen J. Trevanion; Daniel R. Zerbino; Brian Craft; Mary Goldman; Maximilian Haeussler; W. James Kent; Christopher M. Lee; Benedict Paten; Kate R. Rosenbloom; John Vivian; Jingchun Zhu

    Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.

    更新日期:2017-11-01
  • Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Fan Yang; Jiebiao Wang; The GTEx Consortium; Brandon L. Pierce; Lin S. Chen; François Aguet; Kristin G. Ardlie; Beryl B. Cummings; Ellen T. Gelfand; Gad Getz; Kane Hadley; Robert E. Handsaker; Katherine H. Huang; Seva Kashin; Konrad J. Karczewski; Monkol Lek; Xiao Li; Daniel G. MacArthur; Jared L. Nedzel; Duyen T. Nguyen; Michael S. Noble; Ayellet V. Segrè; Casandra A. Trowbridge; Taru Tukiainen; Nathan S. Abell; Brunilda Balliu; Ruth Barshir; Omer Basha; Alexis Battle; Gireesh K. Bogu; Andrew Brown; Christopher D. Brown; Stephane E. Castel; Lin S. Chen; Colby Chiang; Donald F. Conrad; Nancy J. Cox; Farhan N. Damani; Joe R. Davis; Olivier Delaneau; Emmanouil T. Dermitzakis; Barbara E. Engelhardt; Eleazar Eskin; Pedro G. Ferreira; Laure Frésard; Eric R. Gamazon; Diego Garrido-Martín; Ariel D.H. Gewirtz; Genna Gliner; Michael J. Gloudemans; Roderic Guigo; Ira M. Hall; Buhm Han; Yuan He; Farhad Hormozdiari; Cedric Howald; Hae Kyung Im; Brian Jo; Eun Yong Kang; Yungil Kim; Sarah Kim-Hellmuth; Tuuli Lappalainen; Gen Li; Xin Li; Boxiang Liu; Serghei Mangul; Mark I. McCarthy; Ian C. McDowell; Pejman Mohammadi; Jean Monlong; Stephen B. Montgomery; Manuel Muñoz-Aguirre; Anne W. Ndungu; Dan L. Nicolae; Andrew B. Nobel; Meritxell Oliva; Halit Ongen; John J. Palowitch; Nikolaos Panousis; Panagiotis Papasaikas; YoSon Park; Princy Parsana; Anthony J. Payne; Christine B. Peterson; Jie Quan; Ferran Reverter; Chiara Sabatti; Ashis Saha; Michael Sammeth; Alexandra J. Scott; Andrey A. Shabalin; Reza Sodaei; Matthew Stephens; Barbara E. Stranger; Benjamin J. Strober; Jae Hoon Sul; Emily K. Tsang; Sarah Urbut; Martijn van de Bunt; Gao Wang; Xiaoquan Wen; Fred A. Wright; Hualin S. Xi; Esti Yeger-Lotem; Zachary Zappala; Judith B. Zaugg; Yi-Hui Zhou; Joshua M. Akey; Daniel Bates; Joanne Chan; Lin S. Chen; Melina Claussnitzer; Kathryn Demanelis; Morgan Diegel; Jennifer A. Doherty; Andrew P. Feinberg; Marian S. Fernando; Jessica Halow; Kasper D. Hansen; Eric Haugen; Peter F. Hickey; Lei Hou; Farzana Jasmine; Ruiqi Jian; Lihua Jiang; Audra Johnson; Rajinder Kaul; Manolis Kellis; Muhammad G. Kibriya; Kristen Lee; Jin Billy Li; Qin Li; Xiao Li; Jessica Lin; Shin Lin; Sandra Linder; Caroline Linke; Yaping Liu; Matthew T. Maurano; Benoit Molinie; Stephen B. Montgomery; Jemma Nelson; Fidencio J. Neri; Meritxell Oliva; Yongjin Park; Brandon L. Pierce; Nicola J. Rinaldi; Lindsay F. Rizzardi; Richard Sandstrom; Andrew Skol; Kevin S. Smith; Michael P. Snyder; John Stamatoyannopoulos; Barbara E. Stranger; Hua Tang; Emily K. Tsang; Li Wang; Meng Wang; Nicholas Van Wittenberghe; Fan Wu; Rui Zhang; Concepcion R. Nierras; Philip A. Branton; Latarsha J. Carithers; Ping Guan; Helen M. Moore; Abhi Rao; Jimmie B. Vaught; Sarah E. Gould; Nicole C. Lockart; Casey Martin; Jeffery P. Struewing; Simona Volpi; Anjene M. Addington; Susan E. Koester; A. Roger Little; Lori E. Brigham; Richard Hasz; Marcus Hunter; Christopher Johns; Mark Johnson; Gene Kopen; William F. Leinweber; John T. Lonsdale; Alisa McDonald; Bernadette Mestichelli; Kevin Myer; Brian Roe; Michael Salvatore; Saboor Shad; Jeffrey A. Thomas; Gary Walters; Michael Washington; Joseph Wheeler; Jason Bridge; Barbara A. Foster; Bryan M. Gillard; Ellen Karasik; Rachna Kumar; Mark Miklos; Michael T. Moser; Scott D. Jewell; Robert G. Montroy; Daniel C. Rohrer; Dana R. Valley; David A. Davis; Deborah C. Mash; Anita H. Undale; Anna M. Smith; David E. Tabor; Nancy V. Roche; Jeffrey A. McLean; Negin Vatanian; Karna L. Robinson; Leslie Sobin; Mary E. Barcus; Kimberly M. Valentino; Liqun Qi; Steven Hunter; Pushpa Hariharan; Shilpi Singh; Ki Sung Um; Takunda Matose; Maria M. Tomaszewski; Laura K. Barker; Maghboeba Mosavel; Laura A. Siminoff; Heather M. Traino; Paul Flicek; Thomas Juettemann; Magali Ruffier; Dan Sheppard; Kieron Taylor; Stephen J. Trevanion; Daniel R. Zerbino; Brian Craft; Mary Goldman; Maximilian Haeussler; W. James Kent; Christopher M. Lee; Benedict Paten; Kate R. Rosenbloom; John Vivian; Jingchun Zhu

    The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their biological mechanisms. One common trans-eQTLs mechanism is “mediation” by a local (cis) transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are “cis-mediators” of trans-eQTLs, including those “cis-hubs” involved in regulation of many trans-genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying trans-eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study cis-mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of cis-hubs and trans-eQTL regulation across tissue types.

    更新日期:2017-11-01
  • Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Pejman Mohammadi; Stephane E. Castel; Andrew A. Brown; Tuuli Lappalainen

    Mapping cis-acting expression quantitative trait loci (cis-eQTL) has become a popular approach for characterizing proximal genetic regulatory variants. In this paper, we describe and characterize log allelic fold change (aFC), the magnitude of expression change associated with a given genetic variant, as a biologically interpretable unit for quantifying the effect size of cis-eQTLs and a mathematically convenient approach for systematic modeling of cis-regulation. This measure is mathematically independent from expression level and allele frequency, additive, applicable to multiallelic variants, and generalizable to multiple independent variants. We provide efficient tools and guidelines for estimating aFC from both eQTL and allelic expression data sets and apply it to Genotype Tissue Expression (GTEx) data. We show that aFC estimates independently derived from eQTL and allelic expression data are highly consistent, and identify technical and biological correlates of eQTL effect size. We generalize aFC to analyze genes with two eQTLs in GTEx and show that in nearly all cases the two eQTLs act independently in regulating gene expression. In summary, aFC is a solid measure of cis-regulatory effect size that allows quantitative interpretation of cellular regulatory events from population data, and it is a valuable approach for investigating novel aspects of eQTL data sets.

    更新日期:2017-11-01
  • Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Jack Kuipers; Katharina Jahn; Benjamin J. Raphael; Niko Beerenwinkel

    Intra-tumor heterogeneity poses substantial challenges for cancer treatment. A tumor's composition can be deduced by reconstructing its mutational history. Central to current approaches is the infinite sites assumption that every genomic position can only mutate once over the lifetime of a tumor. The validity of this assumption has never been quantitatively assessed. We developed a rigorous statistical framework to test the infinite sites assumption with single-cell sequencing data. Our framework accounts for the high noise and contamination present in such data. We found strong evidence for the same genomic position being mutationally affected multiple times in individual tumors for 11 of 12 single-cell sequencing data sets from a variety of human cancers. Seven cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large-scale genomic deletions. Four cases exhibited a parallel mutation, potentially indicating convergent evolution at the base pair level. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity for more effective cancer treatment.

    更新日期:2017-11-01
  • Detection of long repeat expansions from PCR-free whole-genome sequence data
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Egor Dolzhenko; Joke J.F.A. van Vugt; Richard J. Shaw; Mitchell A. Bekritsky; Marka van Blitterswijk; Giuseppe Narzisi; Subramanian S. Ajay; Vani Rajan; Bryan R. Lajoie; Nathan H. Johnson; Zoya Kingsbury; Sean J. Humphray; Raymond D. Schellevis; William J. Brands; Matt Baker; Rosa Rademakers; Maarten Kooyman; Gijs H.P. Tazelaar; Michael A. van Es; Russell McLaughlin; William Sproviero; Aleksey Shatunov; Ashley Jones; Ahmad Al Khleifat; Alan Pittman; Sarah Morgan; Orla Hardiman; Ammar Al-Chalabi; Chris Shaw; Bradley Smith; Edmund J. Neo; Karen Morrison; Pamela J. Shaw; Catherine Reeves; Lara Winterkorn; Nancy S. Wexler; The US–Venezuela Collaborative Research Group; David E. Housman; Christopher W. Ng; Alina L. Li; Ryan J. Taft; Leonard H. van den Berg; David R. Bentley; Jan H. Veldink; Michael A. Eberle

    Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.

    更新日期:2017-11-01
  • High-throughput single-molecule telomere characterization
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Jennifer McCaffrey; Eleanor Young; Katy Lassahn; Justin Sibert; Steven Pastor; Harold Riethman; Ming Xiao

    We have developed a novel method that enables global subtelomere and haplotype-resolved analysis of telomere lengths at the single-molecule level. An in vitro CRISPR/Cas9 RNA-directed nickase system directs the specific labeling of human (TTAGGG)n DNA tracts in genomes that have also been barcoded using a separate nickase enzyme that recognizes a 7-bp motif genome-wide. High-throughput imaging and analysis of large DNA single molecules from genomes labeled in this fashion using a nanochannel array system permits mapping through subtelomere repeat element (SRE) regions to unique chromosomal DNA while simultaneously measuring the (TTAGGG)n tract length at the end of each large telomere-terminal DNA segment. The methodology also permits subtelomere and haplotype-resolved analyses of SRE organization and variation, providing a window into the population dynamics and potential functions of these complex and structurally variant telomere-adjacent DNA regions. At its current stage of development, the assay can be used to identify and characterize telomere length distributions of 30–35 discrete telomeres simultaneously and accurately. The assay's utility is demonstrated using early versus late passage and senescent human diploid fibroblasts, documenting the anticipated telomere attrition on a global telomere-by-telomere basis as well as identifying subtelomere-specific biases for critically short telomeres. Similarly, we present the first global single-telomere-resolved analyses of two cancer cell lines.

    更新日期:2017-11-01
  • The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Eugene J. Gardner; Vincent K. Lam; Daniel N. Harris; Nelson T. Chuang; Emma C. Scott; W. Stephen Pittard; Ryan E. Mills; The 1000 Genomes Project Consortium; Scott E. Devine

    Mobile element insertions (MEIs) represent ∼25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings.

    更新日期:2017-11-01
  • HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Tao Yang; Feipeng Zhang; Galip Gürkan Yardımcı; Fan Song; Ross C. Hardison; William Stafford Noble; Feng Yue; Qunhua Li

    Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.

    更新日期:2017-11-01
  • Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Mingxiang Teng; Rafael A. Irizarry

    The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics’ public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.

    更新日期:2017-11-01
  • A genome-wide interactome of DNA-associated proteins in the human liver
    Genome Res. (IF 11.922) Pub Date : 2017-11-01
    Ryne C. Ramaker; Daniel Savic; Andrew A. Hardigan; Kimberly Newberry; Gregory M. Cooper; Richard M. Myers; Sara J. Cooper

    Large-scale efforts like the ENCODE Project have made tremendous progress in cataloging the genomic binding patterns of DNA-associated proteins (DAPs), such as transcription factors (TFs). However, most chromatin immunoprecipitation-sequencing (ChIP-seq) analyses have focused on a few immortalized cell lines whose activities and physiology differ in important ways from endogenous cells and tissues. Consequently, binding data from primary human tissue are essential to improving our understanding of in vivo gene regulation. Here, we identify and analyze more than 440,000 binding sites using ChIP-seq data for 20 DAPs in two human liver tissue samples. We integrated binding data with transcriptome and phased WGS data to investigate allelic DAP interactions and the impact of heterozygous sequence variation on the expression of neighboring genes. Our tissue-based data set exhibits binding patterns more consistent with liver biology than cell lines, and we describe uses of these data to better prioritize impactful noncoding variation. Collectively, our rich data set offers novel insights into genome function in human liver tissue and provides a valuable resource for assessing disease-related disruptions.

    更新日期:2017-11-01
  • Sex-biased microRNA expression in mammals and birds reveals underlying regulatory mechanisms and a role in dosage compensation
    Genome Res. (IF 11.922) Pub Date : 2017-10-27
    Maria Warnefors; Katharina Mössinger; Jean Halbert; Tania Studer; John L. VandeBerg; Isa Lindgren; Amir Fallahshahroudi; Per Jensen; Henrik Kaessmann

    Sexual dimorphism depends on sex-biased gene expression, but the contributions of microRNAs (miRNAs) have not been globally assessed. We therefore produced an extensive small RNA sequencing data set to analyze male and female miRNA expression profiles in mouse, opossum, and chicken. Our analyses uncovered numerous cases of somatic sex-biased miRNA expression, with the largest proportion found in the mouse heart and liver. Sex-biased expression is explained by miRNA-specific regulation, including sex-biased chromatin accessibility at promoters, rather than piggybacking of intronic miRNAs on sex-biased protein-coding genes. In mouse, but not opossum and chicken, sex bias is coordinated across tissues such that autosomal testis-biased miRNAs tend to be somatically male-biased, whereas autosomal ovary-biased miRNAs are female-biased, possibly due to broad hormonal control. In chicken, which has a Z/W sex chromosome system, expression output of genes on the Z Chromosome is expected to be male-biased, since there is no global dosage compensation mechanism that restores expression in ZW females after almost all genes on the W Chromosome decayed. Nevertheless, we found that the dominant liver miRNA, miR-122-5p, is Z-linked but expressed in an unbiased manner, due to the unusual retention of a W-linked copy. Another Z-linked miRNA, the male-biased miR-2954-3p, shows conserved preference for dosage-sensitive genes on the Z Chromosome, based on computational and experimental data from chicken and zebra finch, and acts to equalize male-to-female expression ratios of its targets. Unexpectedly, our findings thus establish miRNA regulation as a novel gene-specific dosage compensation mechanism.

    更新日期:2017-10-27
  • Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data
    Genome Res. (IF 11.922) Pub Date : 2017-10-13
    Aaron T.L. Lun; Fernando J. Calero-Nieto; Liora Haim-Vilmovsky; Berthold Göttgens; John C. Marioni

    By profiling the transcriptomes of individual cells, single-cell RNA sequencing provides unparalleled resolution to study cellular heterogeneity. However, this comes at the cost of high technical noise, including cell-specific biases in capture efficiency and library generation. One strategy for removing these biases is to add a constant amount of spike-in RNA to each cell and to scale the observed expression values so that the coverage of spike-in transcripts is constant across cells. This approach has previously been criticized as its accuracy depends on the precise addition of spike-in RNA to each sample. Here, we perform mixture experiments using two different sets of spike-in RNA to quantify the variance in the amount of spike-in RNA added to each well in a plate-based protocol. We also obtain an upper bound on the variance due to differences in behavior between the two spike-in sets. We demonstrate that both factors are small contributors to the total technical variance and have only minor effects on downstream analyses, such as detection of highly variable genes and clustering. Our results suggest that scaling normalization using spike-in transcripts is reliable enough for routine use in single-cell RNA sequencing data analyses.

    更新日期:2017-10-13
  • Single-cell gene expression analysis reveals regulators of distinct cell subpopulations among developing human neurons
    Genome Res. (IF 11.922) Pub Date : 2017-10-13
    Jiaxu Wang; Piroon Jenjaroenpun; Akshay Bhinge; Vladimir Espinosa Angarica; Antonio Del Sol; Intawat Nookaew; Vladimir A. Kuznetsov; Lawrence W. Stanton

    The stochastic dynamics and regulatory mechanisms that govern differentiation of individual human neural precursor cells (NPC) into mature neurons are currently not fully understood. Here, we used single-cell RNA-sequencing (scRNA-seq) of developing neurons to dissect/identify NPC subtypes and critical developmental stages of alternative lineage specifications. This study comprises an unsupervised, high-resolution strategy for identifying cell developmental bifurcations, tracking the stochastic transcript kinetics of the subpopulations, elucidating regulatory networks, and finding key regulators. Our data revealed the bifurcation and developmental tracks of the two NPC subpopulations, and we captured an early (24 h) transition phase that leads to alternative neuronal specifications. The consequent up-regulation and down-regulation of stage- and subpopulation-specific gene groups during the course of maturation revealed biological insights with regard to key regulatory transcription factors and lincRNAs that control cellular programs in the identified neuronal subpopulations.

    更新日期:2017-10-13
  • Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors
    Genome Res. (IF 11.922) Pub Date : 2017-10-13
    Jack Kuipers; Katharina Jahn; Benjamin J. Raphael; Niko Beerenwinkel

    Intra-tumor heterogeneity poses substantial challenges for cancer treatment. A tumor's composition can be deduced by reconstructing its mutational history. Central to current approaches is the infinite sites assumption that every genomic position can only mutate once over the lifetime of a tumor. The validity of this assumption has never been quantitatively assessed. We developed a rigorous statistical framework to test the infinite sites assumption with single-cell sequencing data. Our framework accounts for the high noise and contamination present in such data. We found strong evidence for the same genomic position being mutationally affected multiple times in individual tumors for 11 of 12 single-cell sequencing data sets from a variety of human cancers. Seven cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large-scale genomic deletions. Four cases exhibited a parallel mutation, potentially indicating convergent evolution at the base pair level. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity for more effective cancer treatment.

    更新日期:2017-10-13
  • Detection of long repeat expansions from PCR-free whole-genome sequence data
    Genome Res. (IF 11.922) Pub Date : 2017-09-08
    Egor Dolzhenko; Joke J.F.A. van Vugt; Richard J. Shaw; Mitchell A. Bekritsky; Marka van Blitterswijk; Giuseppe Narzisi; Subramanian S. Ajay; Vani Rajan; Bryan Lajoie; Nathan H. Johnson; Zoya Kingsbury; Sean J. Humphray; Raymond D. Schellevis; William J. Brands; Matt Baker; Rosa Rademakers; Maarten Kooyman; Gijs H.P. Tazelaar; Michael A. van Es; Russell McLaughlin; William Sproviero; Aleksey Shatunov; Ashley Jones; Ahmad Al Khleifat; Alan Pittman; Sarah Morgan; Orla Hardiman; Ammar Al-Chalabi; Chris Shaw; Bradley Smith; Edmund J. Neo; Karren Morrison; Pam Shaw; Catherine Reeves; Lara Winterkorn; Nancy S. Wexler; The US-Venezuela Collaborative Research Group; David E. Housman; Christopher W. Ng; Alina L. Li; Ryan J. Taft; Leonard H. van den Berg; David R. Bentley; Jan H. Veldink; Michael A. Eberle

    Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.

    更新日期:2017-10-13
  • Disease-specific biases in alternative splicing and tissue-specific dysregulation revealed by multitissue profiling of lymphocyte gene expression in type 1 diabetes
    Genome Res. (IF 11.922) Pub Date : 2017-10-12
    Jeremy R.B. Newman; Ana Conesa; Matthew Mika; Felicia N. New; Suna Onengut-Gumuscu; Mark A. Atkinson; Stephen S. Rich; Lauren M. McIntyre; Patrick Concannon

    Genome-wide association studies (GWAS) have identified multiple, shared allelic associations with many autoimmune diseases. However, the pathogenic contributions of variants residing in risk loci remain unresolved. The location of the majority of shared disease-associated variants in noncoding regions suggests they contribute to risk of autoimmunity through effects on gene expression in the immune system. In the current study, we test this hypothesis by applying RNA sequencing to CD4+, CD8+, and CD19+ lymphocyte populations isolated from 81 subjects with type 1 diabetes (T1D). We characterize and compare the expression patterns across these cell types for three gene sets: all genes, the set of genes implicated in autoimmune disease risk by GWAS, and the subset of these genes specifically implicated in T1D. We performed RNA sequencing and aligned the reads to both the human reference genome and a catalog of all possible splicing events developed from the genome, thereby providing a comprehensive evaluation of the roles of gene expression and alternative splicing (AS) in autoimmunity. Autoimmune candidate genes displayed greater expression specificity in the three lymphocyte populations relative to other genes, with significantly increased levels of splicing events, particularly those predicted to have substantial effects on protein isoform structure and function (e.g., intron retention, exon skipping). The majority of single-nucleotide polymorphisms within T1D-associated loci were also associated with one or more cis-expression quantitative trait loci (cis-eQTLs) and/or splicing eQTLs. Our findings highlight a substantial, and previously underrecognized, role for AS in the pathogenesis of autoimmune disorders and particularly for T1D.

    更新日期:2017-10-12
  • Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
    Genome Res. (IF 11.922) Pub Date : 2017-10-12
    Mingxiang Teng; Rafael A. Irizarry

    The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics’ public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.

    更新日期:2017-10-12
  • Nascent RNA sequencing reveals a dynamic global transcriptional response at genes and enhancers to the natural medicinal compound celastrol
    Genome Res. (IF 11.922) Pub Date : 2017-10-12
    Noah Dukler; Gregory T. Booth; Yi-Fei Huang; Nathaniel Tippens; Colin T. Waters; Charles G. Danko; John T. Lis; Adam Siepel

    Most studies of responses to transcriptional stimuli measure changes in cellular mRNA concentrations. By sequencing nascent RNA instead, it is possible to detect changes in transcription in minutes rather than hours and thereby distinguish primary from secondary responses to regulatory signals. Here, we describe the use of PRO-seq to characterize the immediate transcriptional response in human cells to celastrol, a compound derived from traditional Chinese medicine that has potent anti-inflammatory, tumor-inhibitory, and obesity-controlling effects. Celastrol is known to elicit a cellular stress response resembling the response to heat shock, but the transcriptional basis of this response remains unclear. Our analysis of PRO-seq data for K562 cells reveals dramatic transcriptional effects soon after celastrol treatment at a broad collection of both coding and noncoding transcription units. This transcriptional response occurred in two major waves, one within 10 min, and a second 40–60 min after treatment. Transcriptional activity was generally repressed by celastrol, but one distinct group of genes, enriched for roles in the heat shock response, displayed strong activation. Using a regression approach, we identified key transcription factors that appear to drive these transcriptional responses, including members of the E2F and RFX families. We also found sequence-based evidence that particular transcription factors drive the activation of enhancers. We observed increased polymerase pausing at both genes and enhancers, suggesting that pause release may be widely inhibited during the celastrol response. Our study demonstrates that a careful analysis of PRO-seq time-course data can disentangle key aspects of a complex transcriptional response, and it provides new insights into the activity of a powerful pharmacological agent.

    更新日期:2017-10-12
  • High-throughput single-molecule telomere characterization
    Genome Res. (IF 11.922) Pub Date : 2017-10-12
    Jennifer McCaffrey; Eleanor Young; Katy Lassahn; Justin Sibert; Steven Pastor; Harold Riethman; Ming Xiao

    We have developed a novel method that enables global subtelomere and haplotype-resolved analysis of telomere lengths at the single-molecule level. An in vitro CRISPR/Cas9 RNA-directed nickase system directs the specific labeling of human (TTAGGG)n DNA tracts in genomes that have also been barcoded using a separate nickase enzyme that recognizes a 7-bp motif genome-wide. High-throughput imaging and analysis of large DNA single molecules from genomes labeled in this fashion using a nanochannel array system permits mapping through subtelomere repeat element (SRE) regions to unique chromosomal DNA while simultaneously measuring the (TTAGGG)n tract length at the end of each large telomere-terminal DNA segment. The methodology also permits subtelomere and haplotype-resolved analyses of SRE organization and variation, providing a window into the population dynamics and potential functions of these complex and structurally variant telomere-adjacent DNA regions. At its current stage of development, the assay can be used to identify and characterize telomere length distributions of 30–35 discrete telomeres simultaneously and accurately. The assay's utility is demonstrated using early versus late passage and senescent human diploid fibroblasts, documenting the anticipated telomere attrition on a global telomere-by-telomere basis as well as identifying subtelomere-specific biases for critically short telomeres. Similarly, we present the first global single-telomere-resolved analyses of two cancer cell lines.

    更新日期:2017-10-12
  • A genome-wide interactome of DNA-associated proteins in the human liver
    Genome Res. (IF 11.922) Pub Date : 2017-10-11
    Ryne C. Ramaker; Daniel Savic; Andrew A. Hardigan; Kimberly Newberry; Gregory M. Cooper; Richard M. Myers; Sara J. Cooper

    Large-scale efforts like the ENCODE Project have made tremendous progress in cataloging the genomic binding patterns of DNA-associated proteins (DAPs), such as transcription factors (TFs). However, most chromatin immunoprecipitation-sequencing (ChIP-seq) analyses have focused on a few immortalized cell lines whose activities and physiology differ in important ways from endogenous cells and tissues. Consequently, binding data from primary human tissue are essential to improving our understanding of in vivo gene regulation. Here, we identify and analyze more than 440,000 binding sites using ChIP-seq data for 20 DAPs in two human liver tissue samples. We integrated binding data with transcriptome and phased WGS data to investigate allelic DAP interactions and the impact of heterozygous sequence variation on the expression of neighboring genes. Our tissue-based data set exhibits binding patterns more consistent with liver biology than cell lines, and we describe uses of these data to better prioritize impactful noncoding variation. Collectively, our rich data set offers novel insights into genome function in human liver tissue and provides a valuable resource for assessing disease-related disruptions.

    更新日期:2017-10-12
  • Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change
    Genome Res. (IF 11.922) Pub Date : 2017-10-11
    Pejman Mohammadi; Stephane E. Castel; Andrew A. Brown; Tuuli Lappalainen

    Mapping cis-acting expression quantitative trait loci (cis-eQTL) has become a popular approach for characterizing proximal genetic regulatory variants. In this paper, we describe and characterize log allelic fold change (aFC), the magnitude of expression change associated with a given genetic variant, as a biologically interpretable unit for quantifying the effect size of cis-eQTLs and a mathematically convenient approach for systematic modeling of cis-regulation. This measure is mathematically independent from expression level and allele frequency, additive, applicable to multiallelic variants, and generalizable to multiple independent variants. We provide efficient tools and guidelines for estimating aFC from both eQTL and allelic expression data sets and apply it to Genotype Tissue Expression (GTEx) data. We show that aFC estimates independently derived from eQTL and allelic expression data are highly consistent, and identify technical and biological correlates of eQTL effect size. We generalize aFC to analyze genes with two eQTLs in GTEx and show that in nearly all cases the two eQTLs act independently in regulating gene expression. In summary, aFC is a solid measure of cis-regulatory effect size that allows quantitative interpretation of cellular regulatory events from population data, and it is a valuable approach for investigating novel aspects of eQTL data sets.

    更新日期:2017-10-12
  • Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis
    Genome Res. (IF 11.922) Pub Date : 2017-10-11
    Fan Yang; Jiebiao Wang; The GTEx Consortium; Brandon L. Pierce; Lin S. Chen; François Aguet; Kristin G. Ardlie; Beryl B. Cummings; Ellen T. Gelfand; Gad Getz; Kane Hadley; Robert E. Handsaker; Katherine H. Huang; Seva Kashin; Konrad J. Karczewski; Monkol Lek; Xiao Li; Daniel G. MacArthur; Jared L. Nedzel; Duyen T. Nguyen; Michael S. Noble; Ayellet V. Segrè; Casandra A. Trowbridge; Taru Tukiainen; Nathan S. Abell; Brunilda Balliu; Ruth Barshir; Omer Basha; Alexis Battle; Gireesh K. Bogu; Andrew Brown; Christopher D. Brown; Stephane E. Castel; Lin S. Chen; Colby Chiang; Donald F. Conrad; Nancy J. Cox; Farhan N. Damani; Joe R. Davis; Olivier Delaneau; Emmanouil T. Dermitzakis; Barbara E. Engelhardt; Eleazar Eskin; Pedro G. Ferreira; Laure Frésard; Eric R. Gamazon; Diego Garrido-Martín; Ariel D.H. Gewirtz; Genna Gliner; Michael J. Gloudemans; Roderic Guigo; Ira M. Hall; Buhm Han; Yuan He; Farhad Hormozdiari; Cedric Howald; Hae Kyung Im; Brian Jo; Eun Yong Kang; Yungil Kim; Sarah Kim-Hellmuth; Tuuli Lappalainen; Gen Li; Xin Li; Boxiang Liu; Serghei Mangul; Mark I. McCarthy; Ian C. McDowell; Pejman Mohammadi; Jean Monlong; Stephen B. Montgomery; Manuel Muñoz-Aguirre; Anne W. Ndungu; Dan L. Nicolae; Andrew B. Nobel; Meritxell Oliva; Halit Ongen; John J. Palowitch; Nikolaos Panousis; Panagiotis Papasaikas; YoSon Park; Princy Parsana; Anthony J. Payne; Christine B. Peterson; Jie Quan; Ferran Reverter; Chiara Sabatti; Ashis Saha; Michael Sammeth; Alexandra J. Scott; Andrey A. Shabalin; Reza Sodaei; Matthew Stephens; Barbara E. Stranger; Benjamin J. Strober; Jae Hoon Sul; Emily K. Tsang; Sarah Urbut; Martijn van de Bunt; Gao Wang; Xiaoquan Wen; Fred A. Wright; Hualin S. Xi; Esti Yeger-Lotem; Zachary Zappala; Judith B. Zaugg; Yi-Hui Zhou; Joshua M. Akey; Daniel Bates; Joanne Chan; Lin S. Chen; Melina Claussnitzer; Kathryn Demanelis; Morgan Diegel; Jennifer A. Doherty; Andrew P. Feinberg; Marian S. Fernando; Jessica Halow; Kasper D. Hansen; Eric Haugen; Peter F. Hickey; Lei Hou; Farzana Jasmine; Ruiqi Jian; Lihua Jiang; Audra Johnson; Rajinder Kaul; Manolis Kellis; Muhammad G. Kibriya; Kristen Lee; Jin Billy Li; Qin Li; Xiao Li; Jessica Lin; Shin Lin; Sandra Linder; Caroline Linke; Yaping Liu; Matthew T. Maurano; Benoit Molinie; Stephen B. Montgomery; Jemma Nelson; Fidencio J. Neri; Meritxell Oliva; Yongjin Park; Brandon L. Pierce; Nicola J. Rinaldi; Lindsay F. Rizzardi; Richard Sandstrom; Andrew Skol; Kevin S. Smith; Michael P. Snyder; John Stamatoyannopoulos; Barbara E. Stranger; Hua Tang; Emily K. Tsang; Li Wang; Meng Wang; Nicholas Van Wittenberghe; Fan Wu; Rui Zhang; Concepcion R. Nierras; Philip A. Branton; Latarsha J. Carithers; Ping Guan; Helen M. Moore; Abhi Rao; Jimmie B. Vaught; Sarah E. Gould; Nicole C. Lockart; Casey Martin; Jeffery P. Struewing; Simona Volpi; Anjene M. Addington; Susan E. Koester; A. Roger Little; Lori E. Brigham; Richard Hasz; Marcus Hunter; Christopher Johns; Mark Johnson; Gene Kopen; William F. Leinweber; John T. Lonsdale; Alisa McDonald; Bernadette Mestichelli; Kevin Myer; Brian Roe; Michael Salvatore; Saboor Shad; Jeffrey A. Thomas; Gary Walters; Michael Washington; Joseph Wheeler; Jason Bridge; Barbara A. Foster; Bryan M. Gillard; Ellen Karasik; Rachna Kumar; Mark Miklos; Michael T. Moser; Scott D. Jewell; Robert G. Montroy; Daniel C. Rohrer; Dana R. Valley; David A. Davis; Deborah C. Mash; Anita H. Undale; Anna M. Smith; David E. Tabor; Nancy V. Roche; Jeffrey A. McLean; Negin Vatanian; Karna L. Robinson; Leslie Sobin; Mary E. Barcus; Kimberly M. Valentino; Liqun Qi; Steven Hunter; Pushpa Hariharan; Shilpi Singh; Ki Sung Um; Takunda Matose; Maria M. Tomaszewski; Laura K. Barker; Maghboeba Mosavel; Laura A. Siminoff; Heather M. Traino; Paul Flicek; Thomas Juettemann; Magali Ruffier; Dan Sheppard; Kieron Taylor; Stephen J. Trevanion; Daniel R. Zerbino; Brian Craft; Mary Goldman; Maximilian Haeussler; W. James Kent; Christopher M. Lee; Benedict Paten; Kate R. Rosenbloom; John Vivian; Jingchun Zhu

    The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their biological mechanisms. One common trans-eQTLs mechanism is “mediation” by a local (cis) transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are “cis-mediators” of trans-eQTLs, including those “cis-hubs” involved in regulation of many trans-genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying trans-eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study cis-mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of cis-hubs and trans-eQTL regulation across tissue types.

    更新日期:2017-10-12
  • Co-expression networks reveal the tissue-specific regulation of transcription and splicing
    Genome Res. (IF 11.922) Pub Date : 2017-10-11
    Ashis Saha; Yungil Kim; Ariel D.H. Gewirtz; Brian Jo; Chuan Gao; Ian C. McDowell; The GTEx Consortium; Barbara E. Engelhardt; Alexis Battle; François Aguet; Kristin G. Ardlie; Beryl B. Cummings; Ellen T. Gelfand; Gad Getz; Kane Hadley; Robert E. Handsaker; Katherine H. Huang; Seva Kashin; Konrad J. Karczewski; Monkol Lek; Xiao Li; Daniel G. MacArthur; Jared L. Nedzel; Duyen T. Nguyen; Michael S. Noble; Ayellet V. Segrè; Casandra A. Trowbridge; Taru Tukiainen; Nathan S. Abell; Brunilda Balliu; Ruth Barshir; Omer Basha; Alexis Battle; Gireesh K. Bogu; Andrew Brown; Christopher D. Brown; Stephane E. Castel; Lin S. Chen; Colby Chiang; Donald F. Conrad; Nancy J. Cox; Farhan N. Damani; Joe R. Davis; Olivier Delaneau; Emmanouil T. Dermitzakis; Barbara E. Engelhardt; Eleazar Eskin; Pedro G. Ferreira; Laure Frésard; Eric R. Gamazon; Diego Garrido-Martín; Ariel D.H. Gewirtz; Genna Gliner; Michael J. Gloudemans; Roderic Guigo; Ira M. Hall; Buhm Han; Yuan He; Farhad Hormozdiari; Cedric Howald; Hae Kyung Im; Brian Jo; Eun Yong Kang; Yungil Kim; Sarah Kim-Hellmuth; Tuuli Lappalainen; Gen Li; Xin Li; Boxiang Liu; Serghei Mangul; Mark I. McCarthy; Ian C. McDowell; Pejman Mohammadi; Jean Monlong; Stephen B. Montgomery; Manuel Muñoz-Aguirre; Anne W. Ndungu; Dan L. Nicolae; Andrew B. Nobel; Meritxell Oliva; Halit Ongen; John J. Palowitch; Nikolaos Panousis; Panagiotis Papasaikas; YoSon Park; Princy Parsana; Anthony J. Payne; Christine B. Peterson; Jie Quan; Ferran Reverter; Chiara Sabatti; Ashis Saha; Michael Sammeth; Alexandra J. Scott; Andrey A. Shabalin; Reza Sodaei; Matthew Stephens; Barbara E. Stranger; Benjamin J. Strober; Jae Hoon Sul; Emily K. Tsang; Sarah Urbut; Martijn van de Bunt; Gao Wang; Xiaoquan Wen; Fred A. Wright; Hualin S. Xi; Esti Yeger-Lotem; Zachary Zappala; Judith B. Zaugg; Yi-Hui Zhou; Joshua M. Akey; Daniel Bates; Joanne Chan; Lin S. Chen; Melina Claussnitzer; Kathryn Demanelis; Morgan Diegel; Jennifer A. Doherty; Andrew P. Feinberg; Marian S. Fernando; Jessica Halow; Kasper D. Hansen; Eric Haugen; Peter F. Hickey; Lei Hou; Farzana Jasmine; Ruiqi Jian; Lihua Jiang; Audra Johnson; Rajinder Kaul; Manolis Kellis; Muhammad G. Kibriya; Kristen Lee; Jin Billy Li; Qin Li; Xiao Li; Jessica Lin; Shin Lin; Sandra Linder; Caroline Linke; Yaping Liu; Matthew T. Maurano; Benoit Molinie; Stephen B. Montgomery; Jemma Nelson; Fidencio J. Neri; Meritxell Oliva; Yongjin Park; Brandon L. Pierce; Nicola J. Rinaldi; Lindsay F. Rizzardi; Richard Sandstrom; Andrew Skol; Kevin S. Smith; Michael P. Snyder; John Stamatoyannopoulos; Barbara E. Stranger; Hua Tang; Emily K. Tsang; Li Wang; Meng Wang; Nicholas Van Wittenberghe; Fan Wu; Rui Zhang; Concepcion R. Nierras; Philip A. Branton; Latarsha J. Carithers; Ping Guan; Helen M. Moore; Abhi Rao; Jimmie B. Vaught; Sarah E. Gould; Nicole C. Lockart; Casey Martin; Jeffery P. Struewing; Simona Volpi; Anjene M. Addington; Susan E. Koester; A. Roger Little; Lori E. Brigham; Richard Hasz; Marcus Hunter; Christopher Johns; Mark Johnson; Gene Kopen; William F. Leinweber; John T. Lonsdale; Alisa McDonald; Bernadette Mestichelli; Kevin Myer; Brian Roe; Michael Salvatore; Saboor Shad; Jeffrey A. Thomas; Gary Walters; Michael Washington; Joseph Wheeler; Jason Bridge; Barbara A. Foster; Bryan M. Gillard; Ellen Karasik; Rachna Kumar; Mark Miklos; Michael T. Moser; Scott D. Jewell; Robert G. Montroy; Daniel C. Rohrer; Dana R. Valley; David A. Davis; Deborah C. Mash; Anita H. Undale; Anna M. Smith; David E. Tabor; Nancy V. Roche; Jeffrey A. McLean; Negin Vatanian; Karna L. Robinson; Leslie Sobin; Mary E. Barcus; Kimberly M. Valentino; Liqun Qi; Steven Hunter; Pushpa Hariharan; Shilpi Singh; Ki Sung Um; Takunda Matose; Maria M. Tomaszewski; Laura K. Barker; Maghboeba Mosavel; Laura A. Siminoff; Heather M. Traino; Paul Flicek; Thomas Juettemann; Magali Ruffier; Dan Sheppard; Kieron Taylor; Stephen J. Trevanion; Daniel R. Zerbino; Brian Craft; Mary Goldman; Maximilian Haeussler; W. James Kent; Christopher M. Lee; Benedict Paten; Kate R. Rosenbloom; John Vivian; Jingchun Zhu

    Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.

    更新日期:2017-10-12
  • Altered hydroxymethylation is seen at regulatory regions in pancreatic cancer and regulates oncogenic pathways
    Genome Res. (IF 11.922) Pub Date : 2017-10-06
    Sanchari Bhattacharyya; Kith Pradhan; Nathaniel Campbell; Jozef Mazdo; Aparna Vasantkumar; Shahina Maqbool; Tushar D. Bhagat; Sonal Gupta; Masako Suzuki; Yiting Yu; John M. Greally; Ulrich Steidl; James Bradner; Meelad Dawlaty; Lucy Godley; Anirban Maitra; Amit Verma

    Transcriptional deregulation of oncogenic pathways is a hallmark of cancer and can be due to epigenetic alterations. 5-Hydroxymethylcytosine (5-hmC) is an epigenetic modification that has not been studied in pancreatic cancer. Genome-wide analysis of 5-hmC-enriched loci with hmC-seal was conducted in a cohort of low-passage pancreatic cancer cell lines, primary patient-derived xenografts, and pancreatic controls and revealed strikingly altered patterns in neoplastic tissues. Differentially hydroxymethylated regions preferentially affected known regulatory regions of the genome, specifically overlapping with known H3K4me1 enhancers. Furthermore, base pair resolution analysis of cytosine methylation and hydroxymethylation with oxidative bisulfite sequencing was conducted and correlated with chromatin accessibility by ATAC-seq and gene expression by RNA-seq in pancreatic cancer and control samples. 5-hmC was specifically enriched at open regions of chromatin, and gain of 5-hmC was correlated with up-regulation of the cognate transcripts, including many oncogenic pathways implicated in pancreatic neoplasia, such as MYC, KRAS, VEGFA, and BRD4. Specifically, BRD4 was overexpressed and acquired 5-hmC at enhancer regions in the majority of neoplastic samples. Functionally, acquisition of 5-hmC at BRD4 promoter was associated with increase in transcript expression in reporter assays and primary samples. Furthermore, blockade of BRD4 inhibited pancreatic cancer growth in vivo. In summary, redistribution of 5-hmC and preferential enrichment at oncogenic enhancers is a novel regulatory mechanism in human pancreatic cancer.

    更新日期:2017-10-07
  • HiCRep: assessing the reproducibility of Hi-C data using a stratum- adjusted correlation coefficient
    Genome Res. (IF 11.922) Pub Date : 2017-08-30
    Tao Yang; Feipeng Zhang; Galip Gurkan Yardimci; Fan Song; Ross C Hardison; William Stafford Noble; Feng Yue; Qunhua Li

    Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.

    更新日期:2017-10-07
  • The Mobile Element Locator Tool (MELT): Population-scale mobile element discovery and biology
    Genome Res. (IF 11.922) Pub Date : 2017-08-30
    Eugene J. Gardner; Vincent K. Lam; Daniel N. Harris; Nelson T. Chuang; Emma C. Scott; William S. Pittard; Ryan E. Mills; 1000 Genomes Project Consortium; Scott E. Devine

    Mobile element insertions (MEIs) represent ∼25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings.

    更新日期:2017-10-05
  • Transposable elements are the primary source of novelty in primate gene regulation
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Marco Trizzino; YoSon Park; Marcia Holsbach-Beltrame; Katherine Aracena; Katelyn Mika; Minal Caliskan; George H. Perry; Vincent J. Lynch; Christopher D. Brown

    Gene regulation shapes the evolution of phenotypic diversity. We investigated the evolution of liver promoters and enhancers in six primate species using ChIP-seq (H3K27ac and H3K4me1) to profile cis-regulatory elements (CREs) and using RNA-seq to characterize gene expression in the same individuals. To quantify regulatory divergence, we compared CRE activity across species by testing differential ChIP-seq read depths directly measured for orthologous sequences. We show that the primate regulatory landscape is largely conserved across the lineage, with 63% of the tested human liver CREs showing similar activity across species. Conserved CRE function is associated with sequence conservation, proximity to coding genes, cell-type specificity, and transcription factor binding. Newly evolved CREs are enriched in immune response and neurodevelopmental functions. We further demonstrate that conserved CREs bind master regulators, suggesting that while CREs contribute to species adaptation to the environment, core functions remain intact. Newly evolved CREs are enriched in young transposable elements (TEs), including Long-Terminal-Repeats (LTRs) and SINE-VNTR-Alus (SVAs), that significantly affect gene expression. Conversely, only 16% of conserved CREs overlap TEs. We tested the cis-regulatory activity of 69 TE subfamilies by luciferase reporter assays, spanning all major TE classes, and showed that 95.6% of tested TEs can function as either transcriptional activators or repressors. In conclusion, we demonstrated the critical role of TEs in primate gene regulation and illustrated potential mechanisms underlying evolutionary divergence among the primate species through the noncoding genome.

    更新日期:2017-10-03
  • Massive reshaping of genome–nuclear lamina interactions during oncogene-induced senescence
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Christelle Lenain; Carolyn A. de Graaf; Ludo Pagie; Nils L. Visser; Marcel de Haas; Sandra S. de Vries; Daniel Peric-Hupkes; Bas van Steensel; Daniel S. Peeper

    Cellular senescence is a mechanism that virtually irreversibly suppresses the proliferative capacity of cells in response to various stress signals. This includes the expression of activated oncogenes, which causes Oncogene-Induced Senescence (OIS). A body of evidence points to the involvement in OIS of chromatin reorganization, including the formation of senescence-associated heterochromatic foci (SAHF). The nuclear lamina (NL) is an important contributor to genome organization and has been implicated in cellular senescence and organismal aging. It interacts with multiple regions of the genome called lamina-associated domains (LADs). Some LADs are cell-type specific, whereas others are conserved between cell types and are referred to as constitutive LADs (cLADs). Here, we used DamID to investigate the changes in genome–NL interactions in a model of OIS triggered by the expression of the common BRAFV600E oncogene. We found that OIS cells lose most of their cLADS, suggesting the loss of a specific mechanism that targets cLADs to the NL. In addition, multiple genes relocated to the NL. Unexpectedly, they were not repressed, implying the abrogation of the repressive activity of the NL during OIS. Finally, OIS cells displayed an increased association of telomeres with the NL. Our study reveals that senescent cells acquire a new type of LAD organization and suggests the existence of as yet unknown mechanisms that tether cLADs to the NL and repress gene expression at the NL.

    更新日期:2017-10-03
  • Identification of a core TP53 transcriptional program with highly distributed tumor suppressive activity
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Zdenek Andrysik; Matthew D. Galbraith; Anna L. Guarnieri; Sara Zaccara; Kelly D. Sullivan; Ahwan Pandey; Morgan MacBeth; Alberto Inga; Joaquín M. Espinosa

    The tumor suppressor TP53 is the most frequently mutated gene product in human cancer. Close to half of all solid tumors carry inactivating mutations in the TP53 gene, while in the remaining cases, TP53 activity is abrogated by other oncogenic events, such as hyperactivation of its endogenous repressors MDM2 or MDM4. Despite identification of hundreds of genes regulated by this transcription factor, it remains unclear which direct target genes and downstream pathways are essential for the tumor suppressive function of TP53. We set out to address this problem by generating multiple genomic data sets for three different cancer cell lines, allowing the identification of distinct sets of TP53-regulated genes, from early transcriptional targets through to late targets controlled at the translational level. We found that although TP53 elicits vastly divergent signaling cascades across cell lines, it directly activates a core transcriptional program of ∼100 genes with diverse biological functions, regardless of cell type or cellular response to TP53 activation. This core program is associated with high-occupancy TP53 enhancers, high levels of paused RNA polymerases, and accessible chromatin. Interestingly, two different shRNA screens failed to identify a single TP53 target gene required for the anti-proliferative effects of TP53 during pharmacological activation in vitro. Furthermore, bioinformatics analysis of thousands of cancer genomes revealed that none of these core target genes are frequently inactivated in tumors expressing wild-type TP53. These results support the hypothesis that TP53 activates a genetically robust transcriptional program with highly distributed tumor suppressive functions acting in diverse cellular contexts.

    更新日期:2017-10-03
  • Integrative analysis of RNA polymerase II and transcriptional dynamics upon MYC activation
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Stefano de Pretis; Theresia R. Kress; Marco J. Morelli; Arianna Sabò; Chiara Locarno; Alessandro Verrecchia; Mirko Doni; Stefano Campaner; Bruno Amati; Mattia Pelizzola

    Overexpression of the MYC transcription factor causes its widespread interaction with regulatory elements in the genome but leads to the up- and down-regulation of discrete sets of genes. The molecular determinants of these selective transcriptional responses remain elusive. Here, we present an integrated time-course analysis of transcription and mRNA dynamics following MYC activation in proliferating mouse fibroblasts, based on chromatin immunoprecipitation, metabolic labeling of newly synthesized RNA, extensive sequencing, and mathematical modeling. Transcriptional activation correlated with the highest increases in MYC binding at promoters. Repression followed a reciprocal scenario, with the lowest gains in MYC binding. Altogether, the relative abundance (henceforth, “share”) of MYC at promoters was the strongest predictor of transcriptional responses in diverse cell types, predominating over MYC's association with the corepressor ZBTB17 (also known as MIZ1). MYC activation elicited immediate loading of RNA polymerase II (RNAPII) at activated promoters, followed by increases in pause-release, while repressed promoters showed opposite effects. Gains and losses in RNAPII loading were proportional to the changes in the MYC share, suggesting that repression by MYC may be partly indirect, owing to competition for limiting amounts of RNAPII. Secondary to the changes in RNAPII loading, the dynamics of elongation and pre-mRNA processing were also rapidly altered at MYC regulated genes, leading to the transient accumulation of partially or aberrantly processed mRNAs. Altogether, our results shed light on how overexpressed MYC alters the various phases of the RNAPII cycle and the resulting transcriptional response.

    更新日期:2017-10-03
  • Redundant and incoherent regulations of multiple phenotypes suggest microRNAs’ role in stability control
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Zhongqi Liufu; Yixin Zhao; Li Guo; Guangxia Miao; Juan Xiao; Yang Lyu; Yuxin Chen; Suhua Shi; Tian Tang; Chung-I Wu

    Each microRNA (miRNA) represses a web of target genes and, through them, controls multiple phenotypes. The difficulties inherent in such controls cast doubt on how effective miRNAs are in driving phenotypic changes. A “simple regulation” model posits “one target–one phenotype” control under which most targeting is nonfunctional. In an alternative “coordinate regulation” model, multiple targets are assumed to control the same phenotypes coherently, and most targeting is functional. Both models have some empirical support but pose different conceptual challenges. Here, we concurrently analyze multiple targets and phenotypes associated with the miRNA-310 family (miR310s) of Drosophila. Phenotypic rescue in the mir310s knockout background is achieved by promoter-directed RNA interference that restores wild-type expression. For one phenotype (eggshell morphology), we observed redundant regulation, hence rejecting “simple regulation” in favor of the “coordinate regulation” model. For other phenotypes (egg-hatching and male fertility), however, one gene shows full rescue, but three other rescues aggravate the phenotype. Overall, phenotypic controls by miR310s do not support either model. Like a thermostat that controls both heating and cooling elements to regulate temperature, redundancy and incoherence in regulation generally suggest some capacity in stability control. Our results therefore support the published view that miRNAs play a role in the canalization of transcriptome and, hence, phenotypes.

    更新日期:2017-10-03
  • Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms of mutational heterogeneity
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Peng Mao; Alexander J. Brown; Ewa P. Malc; Piotr A. Mieczkowski; Michael J. Smerdon; Steven A. Roberts; John J. Wyrick

    DNA base damage is an important contributor to genome instability, but how the formation and repair of these lesions is affected by the genomic landscape and contributes to mutagenesis is unknown. Here, we describe genome-wide maps of DNA base damage, repair, and mutagenesis at single nucleotide resolution in yeast treated with the alkylating agent methyl methanesulfonate (MMS). Analysis of these maps revealed that base excision repair (BER) of alkylation damage is significantly modulated by chromatin, with faster repair in nucleosome-depleted regions, and slower repair and higher mutation density within strongly positioned nucleosomes. Both the translational and rotational settings of lesions within nucleosomes significantly influence BER efficiency; moreover, this effect is asymmetric relative to the nucleosome dyad axis and is regulated by histone modifications. Our data also indicate that MMS-induced mutations at adenine nucleotides are significantly enriched on the nontranscribed strand (NTS) of yeast genes, particularly in BER-deficient strains, due to higher damage formation on the NTS and transcription-coupled repair of the transcribed strand (TS). These findings reveal the influence of chromatin on repair and mutagenesis of base lesions on a genome-wide scale and suggest a novel mechanism for transcription-associated mutation asymmetry, which is frequently observed in human cancers.

    更新日期:2017-10-03
  • Comparative analysis of alternative polyadenylation in S. cerevisiae and S. pombe
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Xiaochuan Liu; Mainul Hoque; Marc Larochelle; Jean-François Lemay; Nathan Yurko; James L. Manley; François Bachand; Bin Tian

    Alternative polyadenylation (APA) is a widespread mechanism that generates mRNA isoforms with distinct properties. Here we have systematically mapped and compared cleavage and polyadenylation sites (PASs) in two yeast species, S. cerevisiae and S. pombe. Although >80% of the mRNA genes in each species were found to display APA, S. pombe showed greater 3′ UTR size differences among APA isoforms than did S. cerevisiae. PASs in different locations of gene are surrounded with distinct sequences in both species and are often associated with motifs involved in the Nrd1-Nab3-Sen1 termination pathway. In S. pombe, strong motifs surrounding distal PASs lead to higher abundances of long 3′ UTR isoforms than short ones, a feature that is opposite in S. cerevisiae. Differences in PAS placement between convergent genes lead to starkly different antisense transcript landscapes between budding and fission yeasts. In both species, short 3′ UTR isoforms are more likely to be expressed when cells are growing in nutrient-rich media, although different gene groups are affected in each species. Significantly, 3′ UTR shortening in S. pombe coordinates with up-regulation of expression for genes involved in translation during cell proliferation. Using S. pombe strains deficient for Pcf11 or Pab2, we show that reduced expression of 3′-end processing factors lengthens 3′ UTR, with Pcf11 having a more potent effect than Pab2. Taken together, our data indicate that APA mechanisms in S. pombe and S. cerevisiae are largely different: S. pombe has many of the APA features of higher species, and Pab2 in S. pombe has a different role in APA regulation than its mammalian homolog, PABPN1.

    更新日期:2017-10-03
  • RNA editing in bacteria recodes multiple proteins and regulates an evolutionarily conserved toxin-antitoxin system
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Dan Bar-Yaacov; Ernest Mordret; Ruth Towers; Tammy Biniashvili; Clara Soyris; Schraga Schwartz; Orna Dahan; Yitzhak Pilpel

    Adenosine (A) to inosine (I) RNA editing is widespread in eukaryotes. In prokaryotes, however, A-to-I RNA editing was only reported to occur in tRNAs but not in protein-coding genes. By comparing DNA and RNA sequences of Escherichia coli, we show for the first time that A-to-I editing occurs also in prokaryotic mRNAs and has the potential to affect the translated proteins and cell physiology. We found 15 novel A-to-I editing events, of which 12 occurred within known protein-coding genes where they always recode a tyrosine (TAC) into a cysteine (TGC) codon. Furthermore, we identified the tRNA-specific adenosine deaminase (tadA) as the editing enzyme of all these editing sites, thus making it the first identified RNA editing enzyme that modifies both tRNAs and mRNAs. Interestingly, several of the editing targets are self-killing toxins that belong to evolutionarily conserved toxin-antitoxin pairs. We focused on hokB, a toxin that confers antibiotic tolerance by growth inhibition, as it demonstrated the highest level of such mRNA editing. We identified a correlated mutation pattern between the edited and a DNA hard-coded Cys residue positions in the toxin and demonstrated that RNA editing occurs in hokB in two additional bacterial species. Thus, not only the toxin is evolutionarily conserved but also the editing itself within the toxin is. Finally, we found that RNA editing in hokB increases as a function of cell density and enhances its toxicity. Our work thus demonstrates the occurrence, regulation, and functional consequences of RNA editing in bacteria.

    更新日期:2017-10-03
  • Detection of structural mosaicism from targeted and whole-genome sequencing data
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Daniel A. King; Alejandro Sifrim; Tomas W. Fitzgerald; Raheleh Rahbari; Emma Hobson; Tessa Homfray; Sahar Mansour; Sarju G. Mehta; Mohammed Shehla; Susan E. Tomkins; Pradeep C. Vasudevan; Matthew E. Hurles; The Deciphering Developmental Disorders Study

    Structural mosaic abnormalities are large post-zygotic mutations present in a subset of cells and have been implicated in developmental disorders and cancer. Such mutations have been conventionally assessed in clinical diagnostics using cytogenetic or microarray testing. Modern disease studies rely heavily on exome sequencing, yet an adequate method for the detection of structural mosaicism using targeted sequencing data is lacking. Here, we present a method, called MrMosaic, to detect structural mosaic abnormalities using deviations in allele fraction and read coverage from next-generation sequencing data. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) simulations were used to calculate detection performance across a range of mosaic event sizes, types, clonalities, and sequencing depths. The tool was applied to 4911 patients with undiagnosed developmental disorders, and 11 events among nine patients were detected. For eight of these 11 events, mosaicism was observed in saliva but not blood, suggesting that assaying blood alone would miss a large fraction, possibly >50%, of mosaic diagnostic chromosomal rearrangements.

    更新日期:2017-10-03
  • Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Joshua Traynelis; Michael Silk; Quanli Wang; Samuel F. Berkovic; Liping Liu; David B. Ascher; David J. Balding; Slavé Petrovski

    Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10−16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.

    更新日期:2017-10-03
  • Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Ron Schwessinger; Maria C. Suciu; Simon J. McGowan; Jelena Telenius; Stephen Taylor; Doug R. Higgs; Jim R. Hughes

    In the era of genome-wide association studies (GWAS) and personalized medicine, predicting the impact of single nucleotide polymorphisms (SNPs) in regulatory elements is an important goal. Current approaches to determine the potential of regulatory SNPs depend on inadequate knowledge of cell-specific DNA binding motifs. Here, we present Sasquatch, a new computational approach that uses DNase footprint data to estimate and visualize the effects of noncoding variants on transcription factor binding. Sasquatch performs a comprehensive k-mer-based analysis of DNase footprints to determine any k-mer's potential for protein binding in a specific cell type and how this may be changed by sequence variants. Therefore, Sasquatch uses an unbiased approach, independent of known transcription factor binding sites and motifs. Sasquatch only requires a single DNase-seq data set per cell type, from any genotype, and produces consistent predictions from data generated by different experimental procedures and at different sequence depths. Here we demonstrate the effectiveness of Sasquatch using previously validated functional SNPs and benchmark its performance against existing approaches. Sasquatch is available as a versatile webtool incorporating publicly available data, including the human ENCODE collection. Thus, Sasquatch provides a powerful tool and repository for prioritizing likely regulatory SNPs in the noncoding genome.

    更新日期:2017-10-03
  • Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Paul Geeleher; Zhenyu Zhang; Fan Wang; Robert F. Gruener; Aritro Nath; Gladys Morrison; Steven Bhutra; Robert L. Grossman; R. Stephanie Huang

    Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs.

    更新日期:2017-10-03
  • An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Javier Tapial; Kevin C.H. Ha; Timothy Sterne-Weiler; André Gohr; Ulrich Braunschweig; Antonio Hermoso-Pulido; Mathieu Quesnel-Vallières; Jon Permanyer; Reza Sodaei; Yamile Marquez; Luca Cozzuto; Xinchen Wang; Melisa Gómez-Velázquez; Teresa Rayon; Miguel Manzanares; Julia Ponomarenko; Benjamin J. Blencowe; Manuel Irimia

    Alternative splicing (AS) generates remarkable regulatory and proteomic complexity in metazoans. However, the functions of most AS events are not known, and programs of regulated splicing remain to be identified. To address these challenges, we describe the Vertebrate Alternative Splicing and Transcription Database (VastDB), the largest resource of genome-wide, quantitative profiles of AS events assembled to date. VastDB provides readily accessible quantitative information on the inclusion levels and functional associations of AS events detected in RNA-seq data from diverse vertebrate cell and tissue types, as well as developmental stages. The VastDB profiles reveal extensive new intergenic and intragenic regulatory relationships among different classes of AS and previously unknown and conserved landscapes of tissue-regulated exons. Contrary to recent reports concluding that nearly all human genes express a single major isoform, VastDB provides evidence that at least 48% of multiexonic protein-coding genes express multiple splice variants that are highly regulated in a cell/tissue-specific manner, and that >18% of genes simultaneously express multiple major isoforms across diverse cell and tissue types. Isoforms encoded by the latter set of genes are generally coexpressed in the same cells and are often engaged by translating ribosomes. Moreover, they are encoded by genes that are significantly enriched in functions associated with transcriptional control, implying they may have an important and wide-ranging role in controlling cellular activities. VastDB thus provides an unprecedented resource for investigations of AS function and regulation.

    更新日期:2017-10-03
  • Solid-phase reverse transfection for intracellular delivery of functionally active proteins
    Genome Res. (IF 11.922) Pub Date : 2017-10-01
    Ruben Bulkescher; Vytaute Starkuviene; Holger Erfle

    Delivery of large and functionally active biomolecules across cell membranes presents a challenge in cell biological experimentation. For this purpose, we developed a novel solid-phase reverse transfection method that is suitable for the intracellular delivery of proteins into mammalian cells with preservation of their function. We show results for diverse application areas of the method, ranging from antibody-mediated inhibition of protein function to CRISPR/Cas9-based gene editing in living cells. Our method enables prefabrication of “ready to transfect” substrates carrying diverse proteins. This allows their easy distribution and standardization of biological assays across different laboratories.

    更新日期:2017-10-03
Some contents have been Reproduced with permission of the American Chemical Society.
Some contents have been Reproduced by permission of The Royal Society of Chemistry.
化学 • 材料 期刊列表