Evolutionary origins and diversification of testis-specific short histone H2A variants in mammals Genome Res. (IF 11.922) Pub Date : 2018-03-16 Antoine Molaro; Janet M. Young; Harmit S. Malik
Eukaryotic genomes must accomplish both compact packaging for genome stability and inheritance, as well as accessibility for gene expression. They do so using post-translational modifications of four ancient canonical histone proteins (H2A, H2B, H3, and H4) and by deploying histone variants with specialized chromatin functions. Some histone variants are conserved across all eukaryotes, whereas others are lineage-specific. Here, we performed detailed phylogenomic analyses of “short H2A histone” variants found in mammalian genomes. We discovered a previously undescribed typically-sized H2A variant in monotremes and marsupials, H2A.R, which may represent the common ancestor of the short H2As. We also discovered a novel class of short H2A histone variants in eutherian mammals, H2A.Q. We show that short H2A variants arose on the X Chromosome in the common ancestor of all eutherian mammals and diverged into four evolutionarily distinct clades: H2A.B, H2A.L, H2A.P, and H2A.Q. However, the repertoires of short histone H2A variants vary extensively among eutherian mammals due to lineage-specific gains and losses. Finally, we show that all four short H2As are subject to accelerated rates of protein evolution relative to both canonical and other variant H2A proteins including H2A.R. Our analyses reveal that short H2As are a unique class of testis-restricted histone variants displaying an unprecedented evolutionary dynamism. Based on their X-Chromosomal localization, genetic turnover, and testis-specific expression, we hypothesize that short H2A variants may participate in genetic conflicts involving sex chromosomes during reproduction.
Selective maternal seeding and environment shape the human gut microbiome Genome Res. (IF 11.922) Pub Date : 2018-03-01 Katri Korpela; Paul Costea; Luis Pedro Coelho; Stefanie Kandels-Lewis; Gonneke Willemsen; Dorret I. Boomsma; Nicola Segata; Peer Bork
Vertical transmission of bacteria from mother to infant at birth is postulated to initiate a life-long host-microbe symbiosis, playing an important role in early infant development. However, only the tracking of strictly defined unique microbial strains can clarify where the intestinal bacteria come from, how long the initial colonizers persist, and whether colonization by other strains from the environment can replace existing ones. Using rare single nucleotide variants in fecal metagenomes of infants and their family members, we show strong evidence of selective and persistent transmission of maternal strain populations to the vaginally born infant and their occasional replacement by strains from the environment, including those from family members, in later childhood. Only strains from the classes Actinobacteria and Bacteroidia, which are essential components of the infant microbiome, are transmitted from the mother and persist for at least 1 yr. In contrast, maternal strains of Clostridia, a dominant class in the mother's gut microbiome, are not observed in the infant. Caesarean-born infants show a striking lack of maternal transmission at birth. After the first year, strain influx from the family environment occurs and continues even in adulthood. Fathers appear to be more frequently donors of novel strains to other family members than receivers. Thus, the infant gut is seeded by selected maternal bacteria, which expand to form a stable community, with a rare but stable continuing strain influx over time.
SvABA: genome-wide detection of structural variants and indels by local assembly Genome Res. (IF 11.922) Pub Date : 2018-03-13 Jeremiah A. Wala; Pratiti Bandopadhayay; Noah Greenwald; Ryan O'Rourke; Ted Sharpe; Chip Stewart; Steve Schumacher; Yilong Li; Joachim Weischenfeldt; Xiaotong Yao; Chad Nusbaum; Peter Campbell; Gad Getz; Matthew Meyerson; Cheng-Zhong Zhang; Marcin Imielinski; Rameen Beroukhim
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20–300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50–300 bp) SVs.
Epigenetic activation of meiotic recombination near Arabidopsis thaliana centromeres via loss of H3K9me2 and non-CG DNA methylation Genome Res. (IF 11.922) Pub Date : 2018-03-12 Charles J. Underwood; Kyuha Choi; Christophe Lambing; Xiaohui Zhao; Heïdi Serra; Filipe Borges; Joe Simorowski; Evan Ernst; Yannick Jacob; Ian R. Henderson; Robert A. Martienssen
Eukaryotic centromeres contain the kinetochore, which connects chromosomes to the spindle allowing segregation. During meiosis, centromeres are suppressed for inter-homolog crossover, as recombination in these regions can cause chromosome missegregation and aneuploidy. Plant centromeres are surrounded by transposon-dense pericentromeric heterochromatin that is epigenetically silenced by histone 3 lysine 9 dimethylation (H3K9me2), and DNA methylation in CG and non-CG sequence contexts. However, the role of these chromatin modifications in control of meiotic recombination in the pericentromeres is not fully understood. Here, we show that disruption of Arabidopsis thaliana H3K9me2 and non-CG DNA methylation pathways, for example, via mutation of the H3K9 methyltransferase genes KYP/SUVH4 SUVH5 SUVH6, or the CHG DNA methyltransferase gene CMT3, increases meiotic recombination in proximity to the centromeres. Using immunocytological detection of MLH1 foci and genotyping by sequencing of recombinant plants, we observe that H3K9me2 and non-CG DNA methylation pathway mutants show increased pericentromeric crossovers. Increased pericentromeric recombination in H3K9me2/non-CG mutants occurs in hybrid and inbred backgrounds and likely involves contributions from both the interfering and noninterfering crossover repair pathways. We also show that meiotic DNA double-strand breaks (DSBs) increase in H3K9me2/non-CG mutants within the pericentromeres, via purification and sequencing of SPO11-1-oligonucleotides. Therefore, H3K9me2 and non-CG DNA methylation exert a repressive effect on both meiotic DSB and crossover formation in plant pericentromeric heterochromatin. Our results may account for selection of enhancer trap Dissociation (Ds) transposons into the CMT3 gene by recombination with proximal transposon launch-pads.
Nucleosomes and DNA methylation shape meiotic DSB frequency in Arabidopsis thaliana transposons and gene regulatory regions Genome Res. (IF 11.922) Pub Date : 2018-03-12 Kyuha Choi; Xiaohui Zhao; Andrew J. Tock; Christophe Lambing; Charles J. Underwood; Thomas J. Hardcastle; Heïdi Serra; Juhyun Kim; Hyun Seob Cho; Jaeil Kim; Piotr A. Ziolkowski; Nataliya E. Yelina; Ildoo Hwang; Robert A. Martienssen; Ian R. Henderson
Meiotic recombination initiates from DNA double-strand breaks (DSBs) generated by SPO11 topoisomerase-like complexes. Meiotic DSB frequency varies extensively along eukaryotic chromosomes, with hotspots controlled by chromatin and DNA sequence. To map meiotic DSBs throughout a plant genome, we purified and sequenced Arabidopsis thaliana SPO11-1-oligonucleotides. SPO11-1-oligos are elevated in gene promoters, terminators, and introns, which is driven by AT-sequence richness that excludes nucleosomes and allows SPO11-1 access. A positive relationship was observed between SPO11-1-oligos and crossovers genome-wide, although fine-scale correlations were weaker. This may reflect the influence of interhomolog polymorphism on crossover formation, downstream from DSB formation. Although H3K4me3 is enriched in proximity to SPO11-1-oligo hotspots at gene 5′ ends, H3K4me3 levels do not correlate with DSBs. Repetitive transposons are thought to be recombination silenced during meiosis, to prevent nonallelic interactions and genome instability. Unexpectedly, we found high SPO11-1-oligo levels in nucleosome-depleted Helitron/Pogo/Tc1/Mariner DNA transposons, whereas retrotransposons were coldspots. High SPO11-1-oligo transposons are enriched within gene regulatory regions and in proximity to immunity genes, suggesting a role as recombination enhancers. As transposon mobility in plant genomes is restricted by DNA methylation, we used the met1 DNA methyltransferase mutant to investigate the role of heterochromatin in SPO11-1-oligo distributions. Epigenetic activation of meiotic DSBs in proximity to centromeres and transposons occurred in met1 mutants, coincident with reduced nucleosome occupancy, gain of transcription, and H3K4me3. Together, our work reveals a complex relationship between chromatin and meiotic DSBs within A. thaliana genes and transposons, with significance for the diversity and evolution of plant genomes.
Conserved microRNA targeting reveals preexisting gene dosage sensitivities that shaped amniote sex chromosome evolution Genome Res. (IF 11.922) Pub Date : 2018-02-15 Sahin Naqvi; Daniel W. Bellott; Kathy S. Lin; David C. Page
Mammalian X and Y Chromosomes evolved from an ordinary autosomal pair. Genetic decay of the Y led to X Chromosome inactivation (XCI) in females, but some Y-linked genes were retained during the course of sex chromosome evolution, and many X-linked genes did not become subject to XCI. We reconstructed gene-by-gene dosage sensitivities on the ancestral autosomes through phylogenetic analysis of microRNA (miRNA) target sites and compared these preexisting characteristics to the current status of Y-linked and X-linked genes in mammals. Preexisting heterogeneities in dosage sensitivity, manifesting as differences in the extent of miRNA-mediated repression, predicted either the retention of a Y homolog or the acquisition of XCI following Y gene decay. Analogous heterogeneities among avian Z-linked genes predicted either the retention of a W homolog or gene-specific dosage compensation following W gene decay. Genome-wide analyses of human copy number variation indicate that these heterogeneities consisted of sensitivity to both increases and decreases in dosage. We propose a model of XY/ZW evolution incorporating such preexisting dosage sensitivities in determining the evolutionary fates of individual genes. Our findings thus provide a more complete view of the role of dosage sensitivity in shaping the mammalian and avian sex chromosomes and reveal an important role for post-transcriptional regulatory sequences (miRNA target sites) in sex chromosome evolution.
Intrinsic DNA binding properties demonstrated for lineage-specifying basic helix-loop-helix transcription factors Genome Res. (IF 11.922) Pub Date : 2018-03-02 Bradford H. Casey; Rahul K. Kollipara; Karine Pozo; Jane E. Johnson
During development, transcription factors select distinct gene programs from a shared genome, providing the necessary regulatory complexity for temporal and tissue-specific gene expression. How related factors retain their specificity of activity, especially when they recognize the same DNA motifs, is not understood. We address this paradox using basic Helix-Loop-Helix (bHLH) transcription factors, ASCL1, ASCL2, and MYOD1, crucial mediators of lineage specification. In vivo, these factors recognize the same DNA motifs, yet bind largely different genomic sites and regulate distinct transcriptional programs. This suggests that their ability to identify regulatory targets is defined either by the cellular environment of the partially-defined lineages in which they are endogenously expressed, or by intrinsic properties of the factors themselves. To distinguish between these mechanisms, we directly compared the chromatin binding properties of this subset of bHLH factors when ectopically expressed in embryonic stem cells, presenting them with a common chromatin landscape and cellular components. We find that these factors retain distinct binding sites, and thus, specificity of binding is an intrinsic property not requiring a restricted landscape or lineage-specific co-factors. While the ASCL factors and MYOD1 have some distinct DNA motif preference, it is not sufficient to explain the extent of the differential binding. All three factors can bind inaccessible chromatin as defined by ATAC-seq and MNase-seq, bind sites with similar chromatin features, and induce changes in chromatin accessibility and H3K27ac at these sites. A reiterated pattern of DNA binding motifs is uniquely enriched in inaccessible chromatin at sites bound by these bHLH factors. These combined properties define a subclass of lineage-specific bHLH factors, and provide context for their central roles in development and disease.
Complete avian malaria parasite genomes reveal features associated with lineage specific evolution in birds and mammals Genome Res. (IF 11.922) Pub Date : 2018-03-02 Ulrike Böhme; Thomas D. Otto; James Cotton; Sascha Steinbiss; Mandy Sanders; Samuel O. Oyola; Antoine Nicot; Sylvain Gandon; Kailash P. Patra; Colin Herd; Ellen Bushell; Katarzyna K. Modrzynska; Oliver Billker; Joseph M. Vinetz; Ana Rivero; Chris I. Newbold; Matthew Berriman
Avian malaria parasites are prevalent around the world, and infect a wide diversity of bird species. Here we report the sequencing and analysis of high quality draft genome sequences for two avian malaria species, Plasmodium relictum and Plasmodium gallinaceum. We identify 50 genes that are specific to avian malaria, located in an otherwise conserved core of the genome that shares gene synteny with all other sequenced malaria genomes. Phylogenetic analysis suggests that the avian malaria species form an outgroup to the mammalian Plasmodium species and using amino acid divergence between species, we estimate the avian and mammalian-infective lineages diverged in the order of 10 million years ago. Consistent with their phylogenetic position, we identify orthologs of genes that had previously appeared to be restricted to the clades of parasites containing P. falciparum and P. vivax the species with the greatest impact on human health. From these orthologs, we explore differential diversifying selection across the genus and show that the avian lineage is remarkable in the extent to which invasion related genes are evolving. The subtelomeres of the P. relictum and P. gallinaceum genomes contain several novel gene families, including an expanded surf multigene family. We also identify an expansion of reticulocyte binding protein homologs in P. relictum and within these proteins, we detect distinct regions that are specific to non-human primate, humans, rodent and avian hosts. For the first time in the Plasmodium lineage we find evidence of transposable elements, including several hundred fragments of LTR-retrotransposons in both species and an apparently complete LTR-retrotransposon in the genome of P. gallinaceum.
Genome-reconstruction for eukaryotes from complex natural microbial communities Genome Res. (IF 11.922) Pub Date : 2018-03-01 Patrick T West; Alexander J Probst; Igor V Grigoriev; Brian C Thomas; Jillian F Banfield
Microbial eukaryotes are integral components of natural microbial communities and their inclusion is critical for many ecosystem studies yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k-mer-based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation and prediction of metabolic potential. We used this approach to test the effect of addition of organic carbon on a geyser-associated microbial community and detected a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. We demonstrate the broader utility of EukRep by reconstructing and evaluating relatively high quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities.
A human-specific switch of alternatively spliced AFMID isoforms contributes to TP53 mutations and tumor recurrence in hepatocellular carcinoma Genome Res. (IF 11.922) Pub Date : 2018-03-01 Kuan-Ting Lin; Wai Kit Ma; Juergen Scharner; Yun-Ru Liu; Adrian R. Krainer
Pre-mRNA splicing can contribute to the switch of cell identity that occurs in carcinogenesis. Here, we analyze a large collection of RNA-seq data sets and report that splicing changes in hepatocyte-specific enzymes, such as AFMID and KHK, are associated with HCC patients’ survival and relapse. The switch of AFMID isoforms is an early event in HCC development and is associated with driver mutations in TP53 and ARID1A. The switch of AFMID isoforms is human-specific and not detectable in other species, including primates. Finally, we show that overexpression of the full-length AFMID isoform leads to a higher NAD+ level, lower DNA-damage response, and slower cell growth in HepG2 cells. The integrative analysis uncovered a mechanistic link between splicing switches, de novo NAD+ biosynthesis, driver mutations, and HCC recurrence.
3′ UTR lengthening as a novel mechanism in regulating cellular senescence Genome Res. (IF 11.922) Pub Date : 2018-03-01 Meng Chen; Guoliang Lyu; Miao Han; Hongbo Nie; Ting Shen; Wei Chen; Yichi Niu; Yifan Song; Xueping Li; Huan Li; Xinyu Chen; Ziyue Wang; Zheng Xia; Wei Li; Xiao-Li Tian; Chen Ding; Jun Gu; Yufang Zheng; Xinhua Liu; Jinfeng Hu; Gang Wei; Wei Tao; Ting Ni
Cellular senescence has been viewed as a tumor suppression mechanism and also as a contributor to individual aging. Widespread shortening of 3′ untranslated regions (3′ UTRs) in messenger RNAs (mRNAs) by alternative polyadenylation (APA) has recently been discovered in cancer cells. However, the role of APA in the process of cellular senescence remains elusive. Here, we found that hundreds of genes in senescent cells tended to use distal poly(A) (pA) sites, leading to a global lengthening of 3′ UTRs and reduced gene expression. Genes that harbor longer 3′ UTRs in senescent cells were enriched in senescence-related pathways. Rras2, a member of the Ras superfamily that participates in multiple signal transduction pathways, preferred longer 3′ UTR usage and exhibited decreased expression in senescent cells. Depletion of Rras2 promoted senescence, while rescue of Rras2 reversed senescence-associated phenotypes. Mechanistically, splicing factor TRA2B bound to a core “AGAA” motif located in the alternative 3′ UTR of Rras2, thereby reducing the RRAS2 protein level and causing senescence. Both proximal and distal poly(A) signals showed strong sequence conservation, highlighting the vital role of APA regulation during evolution. Our results revealed APA as a novel mechanism in regulating cellular senescence.
Thrombopoietin signaling to chromatin elicits rapid and pervasive epigenome remodeling within poised chromatin architectures Genome Res. (IF 11.922) Pub Date : 2018-03-01 Federico Comoglio; Hyun Jung Park; Stefan Schoenfelder; Iros Barozzi; Daniel Bode; Peter Fraser; Anthony R. Green
Thrombopoietin (TPO) is a critical cytokine regulating hematopoietic stem cell maintenance and differentiation into the megakaryocytic lineage. However, the transcriptional and chromatin dynamics elicited by TPO signaling are poorly understood. Here, we study the immediate early transcriptional and cis-regulatory responses to TPO in hematopoietic stem/progenitor cells (HSPCs) and use this paradigm of cytokine signaling to chromatin to dissect the relationship between cis-regulatory activity and chromatin architecture. We show that TPO profoundly alters the transcriptome of HSPCs, with key hematopoietic regulators being transcriptionally repressed within 30 min of TPO. By examining cis-regulatory dynamics and chromatin architectures, we demonstrate that these changes are accompanied by rapid and extensive epigenome remodeling of cis-regulatory landscapes that is spatially coordinated within topologically associating domains (TADs). Moreover, TPO-responsive enhancers are spatially clustered and engage in preferential homotypic intra- and inter-TAD interactions that are largely refractory to TPO signaling. By further examining the link between cis-regulatory dynamics and chromatin looping, we show that rapid modulation of cis-regulatory activity is largely independent of chromatin looping dynamics. Finally, we show that, although activated and repressed cis-regulatory elements share remarkably similar DNA sequence compositions, transcription factor binding patterns accurately predict rapid cis-regulatory responses to TPO.
ZFX acts as a transcriptional activator in multiple types of human tumors by binding downstream from transcription start sites at the majority of CpG island promoters Genome Res. (IF 11.922) Pub Date : 2018-03-01 Suhn Kyong Rhie; Lijun Yao; Zhifei Luo; Heather Witt; Shannon Schreiner; Yu Guo; Andrew A. Perez; Peggy J. Farnham
High expression of the transcription factor ZFX is correlated with proliferation, tumorigenesis, and patient survival in multiple types of human cancers. However, the mechanism by which ZFX influences transcriptional regulation has not been determined. We performed ChIP-seq in four cancer cell lines (representing kidney, colon, prostate, and breast cancers) to identify ZFX binding sites throughout the human genome. We identified roughly 9000 ZFX binding sites and found that most of the sites are in CpG island promoters. Moreover, genes with promoters bound by ZFX are expressed at higher levels than genes with promoters not bound by ZFX. To determine if ZFX contributes to regulation of the promoters to which it is bound, we performed RNA-seq analysis after knockdown of ZFX by siRNA in prostate and breast cancer cells. Many genes with promoters bound by ZFX were down-regulated upon ZFX knockdown, supporting the hypothesis that ZFX acts as a transcriptional activator. Surprisingly, ZFX binds at +240 bp downstream from the TSS of the responsive promoters. Using Nucleosome Occupancy and Methylome Sequencing (NOMe-seq), we show that ZFX binds between the open chromatin region at the TSS and the first downstream nucleosome, suggesting that ZFX may play a critical role in promoter architecture. We have also shown that a closely related zinc finger protein ZNF711 has a similar binding pattern at CpG island promoters, but ZNF711 may play a subordinate role to ZFX. This functional characterization of ZFX provides important new insights into transcription, chromatin structure, and the regulation of the cancer transcriptome.
Relationship between histone modifications and transcription factor binding is protein family specific Genome Res. (IF 11.922) Pub Date : 2018-03-01 Beibei Xin; Remo Rohs
The very small fraction of putative binding sites (BSs) that are occupied by transcription factors (TFs) in vivo can be highly variable across different cell types. This observation has been partly attributed to changes in chromatin accessibility and histone modification (HM) patterns surrounding BSs. Previous studies focusing on BSs within DNA regulatory regions found correlations between HM patterns and TF binding specificities. However, a mechanistic understanding of TF–DNA binding specificity determinants is still not available. The ability to predict in vivo TF binding on a genome-wide scale requires the identification of features that determine TF binding based on evolutionary relationships of DNA binding proteins. To reveal protein family–dependent mechanisms of TF binding, we conducted comprehensive comparisons of HM patterns surrounding BSs and non-BSs with exactly matched core motifs for TFs in three cell lines: 33 TFs in GM12878, 37 TFs in K562, and 18 TFs in H1-hESC. These TFs displayed protein family–specific preferences for HM patterns surrounding BSs, with high agreement among cell lines. Moreover, compared to models based on DNA sequence and shape at flanking regions of BSs, HM-augmented quantitative machine-learning methods resulted in increased performance in a TF family–specific manner. Analysis of the relative importance of features in these models indicated that TFs, displaying larger HM pattern differences between BSs and non-BSs, bound DNA in an HM-specific manner on a protein family–specific basis. We propose that TF family–specific HM preferences reveal distinct mechanisms that assist in guiding TFs to their cognate BSs by altering chromatin structure and accessibility.
Enhancer RNA profiling predicts transcription factor activity Genome Res. (IF 11.922) Pub Date : 2018-03-01 Joseph G. Azofeifa; Mary A. Allen; Josephina R. Hendrix; Timothy Read; Jonathan D. Rubin; Robin D. Dowell
Transcription factors (TFs) exert their regulatory influence through the binding of enhancers, resulting in coordination of gene expression programs. Active enhancers are often characterized by the presence of short, unstable transcripts termed enhancer RNAs (eRNAs). While their function remains unclear, we demonstrate that eRNAs are a powerful readout of TF activity. We infer sites of eRNA origination across hundreds of publicly available nascent transcription data sets and show that eRNAs initiate from sites of TF binding. By quantifying the colocalization of TF binding motif instances and eRNA origins, we derive a simple statistic capable of inferring TF activity. In doing so, we uncover dozens of previously unexplored links between diverse stimuli and the TFs they affect.
Targeted deletion of a 170-kb cluster of LINE-1 repeats and implications for regional control Genome Res. (IF 11.922) Pub Date : 2018-03-01 Miguel L. Soares; Carol A. Edwards; Frances L. Dearden; Sacri R. Ferrón; Scott Curran; Jennifer A. Corish; Rebecca C. Rancourt; Sarah E. Allen; Marika Charalambous; Malcolm A. Ferguson-Smith; Willem Rens; David J. Adams; Anne C. Ferguson-Smith
Approximately half the mammalian genome is composed of repetitive sequences, and accumulating evidence suggests that some may have an impact on genome function. Here, we characterized a large array class of repeats of long-interspersed elements (LINE-1). Although widely distributed in mammals, locations of such arrays are species specific. Using targeted deletion, we asked whether a 170-kb LINE-1 array located at a mouse imprinted domain might function as a modulator of local transcriptional control. The LINE-1 array is lamina associated in differentiated ES cells consistent with its AT-richness, and although imprinting occurs both proximally and distally to the array, active LINE-1 transcripts within the tract are biallelically expressed. Upon deletion of the array, no perturbation of imprinting was observed, and abnormal phenotypes were not detected in maternal or paternal heterozygous or homozygous mutant mice. The array does not shield nonimprinted genes in the vicinity from local imprinting control. Reduced neural expression of protein-coding genes observed upon paternal transmission of the deletion is likely due to the removal of a brain-specific enhancer embedded within the LINE array. Our findings suggest that presence of a 170-kb LINE-1 array reflects the tolerance of the site for repeat insertion rather than an important genomic function in normal development.
Widespread and precise reprogramming of yeast protein–genome interactions in response to heat shock Genome Res. (IF 11.922) Pub Date : 2018-03-01 Vinesh Vinayachandran; Rohit Reja; Matthew J. Rossi; Bongsoo Park; Lila Rieber; Chitvan Mittal; Shaun Mahony; B. Franklin Pugh
Gene expression is controlled by a variety of proteins that interact with the genome. Their precise organization and mechanism of action at every promoter remains to be worked out. To better understand the physical interplay among genome-interacting proteins, we examined the temporal binding of a functionally diverse subset of these proteins: nucleosomes (H3), H2AZ (Htz1), SWR (Swr1), RSC (Rsc1, Rsc3, Rsc58, Rsc6, Rsc9, Sth1), SAGA (Spt3, Spt7, Ubp8, Sgf11), Hsf1, TFIID (Spt15/TBP and Taf1), TFIIB (Sua7), TFIIH (Ssl2), FACT (Spt16), Pol II (Rpb3), and Pol II carboxyl-terminal domain (CTD) phosphorylation at serines 2, 5, and 7. They were examined under normal and acute heat shock conditions, using the ultrahigh resolution genome-wide ChIP-exo assay in Saccharomyces cerevisiae. Our findings reveal a precise positional organization of proteins bound at most genes, some of which rapidly reorganize within minutes of heat shock. This includes more precise positional transitions of Pol II CTD phosphorylation along the 5′ ends of genes than previously seen. Reorganization upon heat shock includes colocalization of SAGA with promoter-bound Hsf1, a change in RSC subunit enrichment from gene bodies to promoters, and Pol II accumulation within promoter/+1 nucleosome regions. Most of these events are widespread and not necessarily coupled to changes in gene expression. Together, these findings reveal protein–genome interactions that are robustly reprogrammed in precise and uniform ways far beyond what is elicited by changes in gene expression.
CRISPR RNAs trigger innate immune responses in human cells Genome Res. (IF 11.922) Pub Date : 2018-03-01 Sojung Kim; Taeyoung Koo; Hyeon-Gun Jee; Hee-Yeon Cho; Gyeorae Lee; Dong-Gyun Lim; Hyoung Shik Shin; Jin-Soo Kim
Here, we report that CRISPR guide RNAs (gRNAs) with a 5′-triphosphate group (5′-ppp gRNAs) produced via in vitro transcription trigger RNA-sensing innate immune responses in human and murine cells, leading to cytotoxicity. 5′-ppp gRNAs in the cytosol are recognized by DDX58, which in turn activates type I interferon responses, causing up to ∼80% cell death. We show that the triphosphate group can be removed by a phosphatase in vitro and that the resulting 5′-hydroxyl gRNAs in complex with Cas9 or Cpf1 avoid innate immune responses and can achieve targeted mutagenesis at a frequency of 95% in primary human CD4+ T cells. These results are in line with previous findings that chemically synthesized sgRNAs with a 5′-hydroxyl group are much more efficient than in vitro–transcribed (IVT) sgRNAs in human and other mammalian cells. The phosphatase treatment of IVT sgRNAs is a cost-effective method for making highly active sgRNAs, avoiding innate immune responses in human cells.
Targeting mutant KRAS with CRISPR-Cas9 controls tumor growth Genome Res. (IF 11.922) Pub Date : 2018-03-01 Wonjoo Kim; Sangeun Lee; Han Sang Kim; Minjung Song; Yong Hoon Cha; Young-Hoon Kim; Jeonghong Shin; Eun-Seo Lee; Yeonsoo Joo; Jae J. Song; Eun Ju Choi; Jae W. Choi; Jinu Lee; Moonkyung Kang; Jong In Yook; Min Goo Lee; Yeon-Soo Kim; Soonmyung Paik; Hyongbum (Henry) Kim
KRAS is the most frequently mutated oncogene in human tumors, and its activating mutations represent important therapeutic targets. The combination of Cas9 and guide RNA from the CRISPR-Cas system recognizes a specific DNA sequence and makes a double-strand break, which enables editing of the relevant genes. Here, we harnessed CRISPR to specifically target mutant KRAS alleles in cancer cells. We screened guide RNAs using a reporter system and validated them in cancer cells after lentiviral delivery of Cas9 and guide RNA. The survival, proliferation, and tumorigenicity of cancer cells in vitro and the growth of tumors in vivo were determined after delivery of Cas9 and guide RNA. We identified guide RNAs that efficiently target mutant KRAS without significant alterations of the wild-type allele. Doxycycline-inducible expression of this guide RNA in KRAS-mutant cancer cells transduced with a lentiviral vector encoding Cas9 disrupted the mutant KRAS gene, leading to inhibition of cancer cell proliferation both in vitro and in vivo. Intra-tumoral injection of lentivirus and adeno-associated virus expressing Cas9 and sgRNA suppressed tumor growth in vivo, albeit incompletely, in immunodeficient mice. Expression of Cas9 and the guide RNA in cells containing wild-type KRAS did not alter cell survival or proliferation either in vitro and in vivo. Our study provides a proof-of-concept that CRISPR can be utilized to target driver mutations of cancers in vitro and in vivo.
Reconstructing differentiation networks and their regulation from time series single-cell expression data Genome Res. (IF 11.922) Pub Date : 2018-03-01 Jun Ding; Bruce J. Aronow; Naftali Kaminski; Joseph Kitzmiller; Jeffrey A. Whitsett; Ziv Bar-Joseph
Generating detailed and accurate organogenesis models using single-cell RNA-seq data remains a major challenge. Current methods have relied primarily on the assumption that descendant cells are similar to their parents in terms of gene expression levels. These assumptions do not always hold for in vivo studies, which often include infrequently sampled, unsynchronized, and diverse cell populations. Thus, additional information may be needed to determine the correct ordering and branching of progenitor cells and the set of transcription factors (TFs) that are active during advancing stages of organogenesis. To enable such modeling, we have developed a method that learns a probabilistic model that integrates expression similarity with regulatory information to reconstruct the dynamic developmental cell trajectories. When applied to mouse lung developmental data, the method accurately distinguished different cell types and lineages. Existing and new experimental data validated the ability of the method to identify key regulators of cell fate.
SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification Genome Res. (IF 11.922) Pub Date : 2018-03-01 Manuel Tardaguila; Lorena de la Fuente; Cristina Marti; Cécile Pereira; Francisco Jose Pardo-Palacios; Hector del Risco; Marc Ferrell; Maravillas Mellado; Marissa Macchietto; Kenneth Verheggen; Mariola Edelmann; Iakes Ezkurdia; Jesus Vazquez; Michael Tress; Ali Mortazavi; Lennart Martens; Susana Rodriguez-Navarro; Victoria Moreno-Manzano; Ana Conesa
High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.
FIND: difFerential chromatin INteractions Detection using a spatial Poisson process Genome Res. (IF 11.922) Pub Date : 2018-03-01 Mohamed Nadhir Djekidel; Yang Chen; Michael Q. Zhang
Polymer-based simulations and experimental studies indicate the existence of a spatial dependency between the adjacent DNA fibers involved in the formation of chromatin loops. However, the existing strategies for detecting differential chromatin interactions assume that the interacting segments are spatially independent from the other segments nearby. To resolve this issue, we developed a new computational method, FIND, which considers the local spatial dependency between interacting loci. FIND uses a spatial Poisson process to detect differential chromatin interactions that show a significant difference in their interaction frequency and the interaction frequency of their neighbors. Simulation and biological data analysis show that FIND outperforms the widely used count-based methods and has a better signal-to-noise ratio.
Evolutionary expansion of DNA hypomethylation in the mammalian germline genome Genome Res. (IF 11.922) Pub Date : 2018-02-01 Jianghan Qu; Emily Hodges; Antoine Molaro; Pascal Gagneux; Matthew D. Dean; Gregory J. Hannon; Andrew D. Smith
DNA methylation in the germline is among the most important factors influencing the evolution of mammalian genomes. Yet little is known about its evolutionary rate or the fraction of the methylome that has undergone change. We compared whole-genome, single-CpG DNA methylation profiles in sperm of seven species—human, chimpanzee, gorilla, rhesus macaque, mouse, rat, and dog—to investigate epigenomic evolution. We developed a phylo-epigenetic model for DNA methylation that accommodates the correlation of states at neighboring sites and allows for inference of ancestral states. Applying this model to the sperm methylomes, we uncovered an overall evolutionary expansion of the hypomethylated fraction of the genome, driven both by the birth of new hypomethylated regions and by extensive widening of hypomethylated intervals in ancestral species. This expansion shows strong lineage-specific aspects, most notably that hypomethylated intervals around transcription start sites have evolved to be considerably wider in primates and dog than in rodents, whereas rodents show evidence of a greater trend toward birth of new hypomethylated regions. Lineage-specific hypomethylated regions are enriched near sets of genes with common developmental functions and significant overlap across lineages. Rodent-specific and primate-specific hypomethylated regions are enriched for binding sites of similar transcription factors, suggesting that the plasticity accommodated by certain regulatory factors is conserved, despite substantial change in the specific sites of regulation. Overall our results reveal substantial global epigenomic change in mammalian sperm methylomes and point to a divergence in trans-epigenetic mechanisms that govern the organization of epigenetic states at gene promoters.
Enhancer transcription reveals subtype-specific gene expression programs controlling breast cancer pathogenesis Genome Res. (IF 11.922) Pub Date : 2018-02-01 Hector L. Franco; Anusha Nagari; Venkat S. Malladi; Wenqian Li; Yuanxin Xi; Dana Richardson; Kendra L. Allton; Kaori Tanaka; Jing Li; Shino Murakami; Khandan Keyomarsi; Mark T. Bedford; Xiaobing Shi; Wei Li; Michelle C. Barton; Sharon Y.R. Dent; W. Lee Kraus
Noncoding transcription is a defining feature of active enhancers, linking transcription factor (TF) binding to the molecular mechanisms controlling gene expression. To determine the relationship between enhancer activity and biological outcomes in breast cancers, we profiled the transcriptomes (using GRO-seq and RNA-seq) and epigenomes (using ChIP-seq) of 11 different human breast cancer cell lines representing five major molecular subtypes of breast cancer, as well as two immortalized (“normal”) human breast cell lines. In addition, we developed a robust and unbiased computational pipeline that simultaneously identifies putative subtype-specific enhancers and their cognate TFs by integrating the magnitude of enhancer transcription, TF mRNA expression levels, TF motif P-values, and enrichment of H3K4me1 and H3K27ac. When applied across the 13 different cell lines noted above, the Total Functional Score of Enhancer Elements (TFSEE) identified key breast cancer subtype-specific TFs that act at transcribed enhancers to dictate gene expression patterns determining growth outcomes, including Forkhead TFs, FOSL1, and PLAG1. FOSL1, a Fos family TF, (1) is highly enriched at the enhancers of triple negative breast cancer (TNBC) cells, (2) acts as a key regulator of the proliferation and viability of TNBC cells, but not Luminal A cells, and (3) is associated with a poor prognosis in TNBC breast cancer patients. Taken together, our results validate our enhancer identification pipeline and reveal that enhancers transcribed in breast cancer cells direct critical gene regulatory networks that promote pathogenesis.
Local sequence features that influence AP-1 cis-regulatory activity Genome Res. (IF 11.922) Pub Date : 2018-02-01 Hemangi G. Chaudhari; Barak A. Cohen
In the genome, most occurrences of transcription factor binding sites (TFBS) have no cis-regulatory activity, which suggests that flanking sequences contain information that distinguishes functional from nonfunctional TFBS. We interrogated the role of flanking sequences near Activator Protein 1 (AP-1) binding sites that reside in DNase I Hypersensitive Sites (DHS) and regions annotated as Enhancers. In these regions, we found that sequence features directly adjacent to the core motif distinguish high from low activity AP-1 sites. Some nearby features are motifs for other TFs that genetically interact with the AP-1 site. Other features are extensions of the AP-1 core motif, which cause the extended sites to match motifs of multiple AP-1 binding proteins. Computational models trained on these data distinguish between sequences with high and low activity AP-1 sites and also predict changes in cis-regulatory activity due to mutations in AP-1 core sites and their flanking sequences. Our results suggest that extended AP-1 binding sites, together with adjacent binding sites for additional TFs, encode part of the information that governs TFBS activity in the genome.
Transcription factor activity rhythms and tissue-specific chromatin interactions explain circadian gene expression across organs Genome Res. (IF 11.922) Pub Date : 2018-02-01 Jake Yeung; Jérôme Mermet; Céline Jouffe; Julien Marquis; Aline Charpagne; Frédéric Gachon; Felix Naef
Temporal control of physiology requires the interplay between gene networks involved in daily timekeeping and tissue function across different organs. How the circadian clock interweaves with tissue-specific transcriptional programs is poorly understood. Here, we dissected temporal and tissue-specific regulation at multiple gene regulatory layers by examining mouse tissues with an intact or disrupted clock over time. Integrated analysis uncovered two distinct regulatory modes underlying tissue-specific rhythms: tissue-specific oscillations in transcription factor (TF) activity, which were linked to feeding-fasting cycles in liver and sodium homeostasis in kidney; and colocalized binding of clock and tissue-specific transcription factors at distal enhancers. Chromosome conformation capture (4C-seq) in liver and kidney identified liver-specific chromatin loops that recruited clock-bound enhancers to promoters to regulate liver-specific transcriptional rhythms. Furthermore, this looping was remarkably promoter-specific on the scale of less than 10 kilobases (kb). Enhancers can contact a rhythmic promoter while looping out nearby nonrhythmic alternative promoters, confining rhythmic enhancer activity to specific promoters. These findings suggest that chromatin folding enables the clock to regulate rhythmic transcription of specific promoters to output temporal transcriptional programs tailored to different tissues.
The nuclear matrix protein HNRNPU maintains 3D genome architecture globally in mouse hepatocytes Genome Res. (IF 11.922) Pub Date : 2018-02-01 Hui Fan; Pin Lv; Xiangru Huo; Jicheng Wu; Qianfeng Wang; Lu Cheng; Yun Liu; Qi-Qun Tang; Ling Zhang; Feng Zhang; Xiaoqi Zheng; Hao Wu; Bo Wen
Eukaryotic chromosomes are folded into higher-order conformations to coordinate genome functions. In addition to long-range chromatin loops, recent chromosome conformation capture (3C)-based studies have indicated higher levels of chromatin structures including compartments and topologically associating domains (TADs), which may serve as units of genome organization and functions. However, the molecular machinery underlying these hierarchically three-dimensional (3D) chromatin architectures remains poorly understood. Via high-throughput assays, including in situ Hi-C, DamID, ChIP-seq, and RNA-seq, we investigated roles of the Heterogeneous Nuclear Ribonucleoprotein U (HNRNPU), a nuclear matrix (NM)-associated protein, in 3D genome organization. Upon the depletion of HNRNPU in mouse hepatocytes, the coverage of lamina-associated domains (LADs) in the genome increases from 53.1% to 68.6%, and a global condensation of chromatin was observed. Furthermore, disruption of HNRNPU leads to compartment switching on 7.5% of the genome, decreases TAD boundary strengths at borders between A (active) and B (inactive) compartments, and reduces chromatin loop intensities. Long-range chromatin interactions between and within compartments or TADs are also significantly remodeled upon HNRNPU depletion. Intriguingly, HNRNPU mainly associates with active chromatin, and 80% of HNRNPU peaks coincide with the binding of CTCF or RAD21. Collectively, we demonstrated that HNRNPU functions as a major factor maintaining 3D chromatin architecture, suggesting important roles of NM-associated proteins in genome organization.
Transcription rate strongly affects splicing fidelity and cotranscriptionality in budding yeast Genome Res. (IF 11.922) Pub Date : 2018-02-01 Vahid Aslanzadeh; Yuanhua Huang; Guido Sanguinetti; Jean D. Beggs
The functional consequences of alternative splicing on altering the transcription rate have been the subject of intensive study in mammalian cells but less is known about effects of splicing on changing the transcription rate in yeast. We present several lines of evidence showing that slow RNA polymerase II elongation increases both cotranscriptional splicing and splicing efficiency and that faster elongation reduces cotranscriptional splicing and splicing efficiency in budding yeast, suggesting that splicing is more efficient when cotranscriptional. Moreover, we demonstrate that altering the RNA polymerase II elongation rate in either direction compromises splicing fidelity, and we reveal that splicing fidelity depends largely on intron length together with secondary structure and splice site score. These effects are notably stronger for the highly expressed ribosomal protein coding transcripts. We propose that transcription by RNA polymerase II is tuned to optimize the efficiency and accuracy of ribosomal protein gene expression, while allowing flexibility in splice site choice with the nonribosomal protein transcripts.
Conserved non-AUG uORFs revealed by a novel regression analysis of ribosome profiling data Genome Res. (IF 11.922) Pub Date : 2018-02-01 Pieter Spealman; Armaghan W. Naik; Gemma E. May; Scott Kuersten; Lindsay Freeberg; Robert F. Murphy; Joel McManus
Upstream open reading frames (uORFs), located in transcript leaders (5′ UTRs), are potent cis-acting regulators of translation and mRNA turnover. Recent genome-wide ribosome profiling studies suggest that thousands of uORFs initiate with non-AUG start codons. Although intriguing, these non-AUG uORF predictions have been made without statistical control or validation; thus, the importance of these elements remains to be demonstrated. To address this, we took a comparative genomics approach to study AUG and non-AUG uORFs. We mapped transcription leaders in multiple Saccharomyces yeast species and applied a novel machine learning algorithm (uORF-seqr) to ribosome profiling data to identify statistically significant uORFs. We found that AUG and non-AUG uORFs are both frequently found in Saccharomyces yeasts. Although most non-AUG uORFs are found in only one species, hundreds have either conserved sequence or position within Saccharomyces. uORFs initiating with UUG are particularly common and are shared between species at rates similar to that of AUG uORFs. However, non-AUG uORFs are translated less efficiently than AUG-uORFs and are less subject to removal via alternative transcription initiation under normal growth conditions. These results suggest that a subset of non-AUG uORFs may play important roles in regulating gene expression.
Precise and efficient nucleotide substitution near genomic nick via noncanonical homology-directed repair Genome Res. (IF 11.922) Pub Date : 2018-02-01 Kazuhiro Nakajima; Yue Zhou; Akiko Tomita; Yoshihiro Hirade; Channabasavaiah B. Gurumurthy; Shinichiro Nakada
CRISPR/Cas9, which generates DNA double-strand breaks (DSBs) at target loci, is a powerful tool for editing genomes when codelivered with a donor DNA template. However, DSBs, which are the most deleterious type of DNA damage, often result in unintended nucleotide insertions/deletions (indels) via mutagenic nonhomologous end joining. We developed a strategy for precise gene editing that does not generate DSBs. We show that a combination of single nicks in the target gene and donor plasmid (SNGD) using Cas9D10A nickase promotes efficient nucleotide substitution by gene editing. Nicking the target gene alone did not facilitate efficient gene editing. However, an additional nick in the donor plasmid backbone markedly improved the gene-editing efficiency. SNGD-mediated gene editing led to a markedly lower indel frequency than that by the DSB-mediated approach. We also show that SNGD promotes gene editing at endogenous loci in human cells. Mechanistically, SNGD-mediated gene editing requires long-sequence homology between the target gene and repair template, but does not require CtIP, RAD51, or RAD52. Thus, it is considered that noncanonical homology-directed repair regulates the SNGD-mediated gene editing. In summary, SNGD promotes precise and efficient gene editing and may be a promising strategy for the development of a novel gene therapy approach.
Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome Genome Res. (IF 11.922) Pub Date : 2018-02-01 Hagen Tilgner; Fereshteh Jahanbani; Ishaan Gupta; Paul Collier; Eric Wei; Morten Rasmussen; Michael Snyder
Understanding transcriptome complexity is crucial for understanding human biology and disease. Technologies such as Synthetic long-read RNA sequencing (SLR-RNA-seq) delivered 5 million isoforms and allowed assessing splicing coordination. Pacific Biosciences and Oxford Nanopore increase throughput also but require high input amounts or amplification. Our new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k–200k partitions of 10–200 molecules at a time, enabling analysis of 10–100 million RNA molecules. SpISO-seq requires less than 1 ng of input cDNA, limiting or removing the need for prior amplification with its associated biases. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. The increased number of molecules expands our understanding of isoform complexity. In addition to confirming our previously published cases of splicing coordination (e.g., BIN1), the greater depth reveals many new cases, such as MAPT. Coordination of internal exons is found to be extensive among protein coding genes: 23.5%–59.3% (95% confidence interval) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for noncoding sequences, suggesting a larger role of splicing coordination in shaping proteins. Groups of genes with coordination are involved in protein–protein interactions with each other, raising the possibility that coordination facilitates complex formation and/or function. We also find new splicing coordination types, involving initial and terminal exons. Our results provide a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.
Integrated analysis of motif activity and gene expression changes of transcription factors Genome Res. (IF 11.922) Pub Date : 2018-02-01 Jesper Grud Skat Madsen; Alexander Rauch; Elvira Laila Van Hauwaert; Søren Fisker Schmidt; Marc Winnefeld; Susanne Mandrup
The ability to predict transcription factors based on sequence information in regulatory elements is a key step in systems-level investigation of transcriptional regulation. Here, we have developed a novel tool, IMAGE, for precise prediction of causal transcription factors based on transcriptome profiling and genome-wide maps of enhancer activity. High precision is obtained by combining a near-complete database of position weight matrices (PWMs), generated by compiling public databases and systematic prediction of PWMs for uncharacterized transcription factors, with a state-of-the-art method for PWM scoring and a novel machine learning strategy, based on both enhancers and promoters, to predict the contribution of motifs to transcriptional activity. We applied IMAGE to published data obtained during 3T3-L1 adipocyte differentiation and showed that IMAGE predicts causal transcriptional regulators of this process with higher confidence than existing methods. Furthermore, we generated genome-wide maps of enhancer activity and transcripts during human mesenchymal stem cell commitment and adipocyte differentiation and used IMAGE to identify positive and negative transcriptional regulators of this process. Collectively, our results demonstrate that IMAGE is a powerful and precise method for prediction of regulators of gene expression.
Detecting differential copy number variation between groups of samples Genome Res. (IF 11.922) Pub Date : 2018-02-01 Craig B. Lowe; Nicelio Sanchez-Luege; Timothy R. Howes; Shannon D. Brady; Rhea R. Daugherty; Felicity C. Jones; Michael A. Bell; David M. Kingsley
We present a method to detect copy number variants (CNVs) that are differentially present between two groups of sequenced samples. We use a finite-state transducer where the emitted read depth is conditioned on the mappability and GC-content of all reads that occur at a given base position. In this model, the read depth within a region is a mixture of binomials, which in simulations matches the read depth more closely than the often-used negative binomial distribution. The method analyzes all samples simultaneously, preserving uncertainty as to the breakpoints and magnitude of CNVs present in an individual when it identifies CNVs differentially present between the two groups. We apply this method to identify CNVs that are recurrently associated with postglacial adaptation of marine threespine stickleback (Gasterosteus aculeatus) to freshwater. We identify 6664 regions of the stickleback genome, totaling 1.7 Mbp, which show consistent copy number differences between marine and freshwater populations. These deletions and duplications affect both protein-coding genes and cis-regulatory elements, including a noncoding intronic telencephalon enhancer of DCHS1. The functions of the genes near or included within the 6664 CNVs are enriched for immunity and muscle development, as well as head and limb morphology. Although freshwater stickleback have repeatedly evolved from marine populations, we show that freshwater stickleback also act as reservoirs for ancient ancestral sequences that are highly conserved among distantly related teleosts, but largely missing from marine stickleback due to recent selective sweeps in marine populations.
MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome Genome Res. (IF 11.922) Pub Date : 2018-02-01 John R. Tyson; Nigel J. O'Neil; Miten Jain; Hugh E. Olsen; Philip Hieter; Terrance P. Snutch
Advances in long-read single molecule sequencing have opened new possibilities for ‘benchtop’ whole-genome sequencing. The Oxford Nanopore Technologies MinION is a portable device that uses nanopore technology that can directly sequence DNA molecules. MinION single molecule long sequence reads are well suited for de novo assembly of complex genomes as they facilitate the construction of highly contiguous physical genome maps obviating the need for labor-intensive physical genome mapping. Long sequence reads can also be used to delineate complex chromosomal rearrangements, such as those that occur in tumor cells, that can confound analysis using short reads. Here, we assessed MinION long-read-derived sequences for feasibility concerning: (1) the de novo assembly of a large complex genome, and (2) the elucidation of complex rearrangements. The genomes of two Caenorhabditis elegans strains, a wild-type strain and a strain containing two complex rearrangements, were sequenced with MinION. Up to 42-fold coverage was obtained from a single flow cell, and the best pooled data assembly produced a highly contiguous wild-type C. elegans genome containing 48 contigs (N50 contig length = 3.99 Mb) covering >99% of the 100,286,401-base reference genome. Further, the MinION-derived genome assembly expanded the C. elegans reference genome by >2 Mb due to a more accurate determination of repetitive sequence elements and assembled the complete genomes of two co-extracted bacteria. MinION long-read sequence data also facilitated the elucidation of complex rearrangements in a mutagenized strain. The sequence accuracy of the MinION long-read contigs (∼98%) was improved using Illumina-derived sequence data to polish the final genome assembly to 99.8% nucleotide accuracy when compared to the reference assembly.
Slightly deleterious genomic variants and transcriptome perturbations in Down syndrome embryonic selection Genome Res. (IF 11.922) Pub Date : 2018-01-01 Konstantin Popadin; Stephan Peischl; Marco Garieri; M. Reza Sailani; Audrey Letourneau; Federico Santoni; Samuel W. Lukowski; Georgii A. Bazykin; Sergey Nikolaev; Diogo Meyer; Laurent Excoffier; Alexandre Reymond; Stylianos E. Antonarakis
The majority of aneuploid fetuses are spontaneously miscarried. Nevertheless, some aneuploid individuals survive despite the strong genetic insult. Here, we investigate if the survival probability of aneuploid fetuses is affected by the genome-wide burden of slightly deleterious variants. We analyzed two cohorts of live-born Down syndrome individuals (388 genotyped samples and 16 fibroblast transcriptomes) and observed a deficit of slightly deleterious variants on Chromosome 21 and decreased transcriptome-wide variation in the expression level of highly constrained genes. We interpret these results as signatures of embryonic selection, and propose a genetic handicap model whereby an individual bearing an extremely severe deleterious variant (such as aneuploidy) could escape embryonic lethality if the genome-wide burden of slightly deleterious variants is sufficiently low. This approach can be used to study the composition and effect of the numerous slightly deleterious variants in humans and model organisms.
Saturation mutagenesis reveals manifold determinants of exon definition Genome Res. (IF 11.922) Pub Date : 2018-01-01 Shengdong Ke; Vincent Anquetil; Jorge Rojas Zamalloa; Alisha Maity; Anthony Yang; Mauricio A. Arias; Sergey Kalachikov; James J. Russo; Jingyue Ju; Lawrence A. Chasin
To illuminate the extent and roles of exonic sequences in the splicing of human RNA transcripts, we conducted saturation mutagenesis of a 51-nt internal exon in a three-exon minigene. All possible single and tandem dinucleotide substitutions were surveyed. Using high-throughput genetics, 5560 minigene molecules were assayed for splicing in human HEK293 cells. Up to 70% of mutations produced substantial (greater than twofold) phenotypes of either increased or decreased splicing. Of all predicted secondary structural elements, only a single 15-nt stem–loop showed a strong correlation with splicing, acting negatively. The in vitro formation of exon-protein complexes between the mutant molecules and proteins associated with spliceosome formation (U2AF35, U2AF65, U1A, and U1-70K) correlated with splicing efficiencies, suggesting exon definition as the step affected by most mutations. The measured relative binding affinities of dozens of human RNA binding protein domains as reported in the CISBP-RNA database were found to correlate either positively or negatively with splicing efficiency, more than could fit on the 51-nt test exon simultaneously. The large number of these functional protein binding correlations point to a dynamic and heterogeneous population of pre-mRNA molecules, each responding to a particular collection of binding proteins.
Discovery of noncanonical translation initiation sites through mass spectrometric analysis of protein N termini Genome Res. (IF 11.922) Pub Date : 2018-01-01 Chan Hyun Na; Mustafa A. Barbhuiya; Min-Sik Kim; Steven Verbruggen; Stephen M. Eacker; Olga Pletnikova; Juan C. Troncoso; Marc K. Halushka; Gerben Menschaert; Christopher M. Overall; Akhilesh Pandey
Translation initiation generally occurs at AUG codons in eukaryotes, although it has been shown that non-AUG or noncanonical translation initiation can also occur. However, the evidence for noncanonical translation initiation sites (TISs) is largely indirect and based on ribosome profiling (Ribo-seq) studies. Here, using a strategy specifically designed to enrich N termini of proteins, we demonstrate that many human proteins are translated at noncanonical TISs. The large majority of TISs that mapped to 5′ untranslated regions were noncanonical and led to N-terminal extension of annotated proteins or translation of upstream small open reading frames (uORF). It has been controversial whether the amino acid corresponding to the start codon is incorporated at the TIS or methionine is still incorporated. We found that methionine was incorporated at almost all noncanonical TISs identified in this study. Comparison of the TISs determined through mass spectrometry with ribosome profiling data revealed that about two-thirds of the novel annotations were indeed supported by the available ribosome profiling data. Sequence conservation across species and a higher abundance of noncanonical TISs than canonical ones in some cases suggests that the noncanonical TISs can have biological functions. Overall, this study provides evidence of protein translation initiation at noncanonical TISs and argues that further studies are required for elucidation of functional implications of such noncanonical translation initiation.
H3S10ph broadly marks early-replicating domains in interphase ESCs and shows reciprocal antagonism with H3K9me2 Genome Res. (IF 11.922) Pub Date : 2018-01-01 Carol C.L. Chen; Preeti Goyal; Mohammad M. Karimi; Marie H. Abildgaard; Hiroshi Kimura; Matthew C. Lorincz
Phosphorylation of histone H3 at serine 10 (H3S10ph) by Aurora kinases plays an important role in mitosis; however, H3S10ph also marks regulatory regions of inducible genes in interphase mammalian cells, implicating mitosis-independent functions. Using the fluorescent ubiquitin-mediated cell cycle indicator (FUCCI), we found that 30% of the genome in interphase mouse embryonic stem cells (ESCs) is marked with H3S10ph. H3S10ph broadly demarcates gene-rich regions in G1 and is positively correlated with domains of early DNA replication timing (RT) but negatively correlated with H3K9me2 and lamin-associated domains (LADs). Consistent with mitosis-independent kinase activity, this pattern was preserved in ESCs treated with Hesperadin, a potent inhibitor of Aurora B/C kinases. Disruption of H3S10ph by expression of nonphosphorylatable H3.3S10A results in ectopic spreading of H3K9me2 into adjacent euchromatic regions, mimicking the phenotype observed in Drosophila JIL-1 kinase mutants. Conversely, interphase H3S10ph domains expand in Ehmt1 (also known as Glp) null ESCs, revealing that H3S10ph deposition is restricted by H3K9me2. Strikingly, spreading of H3S10ph at RT transition regions (TTRs) is accompanied by aberrant transcription initiation of genes co-oriented with the replication fork in Ehmt1−/− and Ehmt2−/− ESCs, indicating that establishment of repressive chromatin on the leading strand following DNA synthesis may depend upon these lysine methyltransferases. H3S10ph is also anti-correlated with H3K9me2 in interphase murine embryonic fibroblasts (MEFs) and is restricted to intragenic regions of actively transcribing genes by EHMT2. Taken together, these observations reveal that H3S10ph may play a general role in restricting the spreading of repressive chromatin in interphase mammalian cells.
Deep experimental profiling of microRNA diversity, deployment, and evolution across the Drosophila genus Genome Res. (IF 11.922) Pub Date : 2018-01-01 Jaaved Mohammed; Alex S. Flynt; Alexandra M. Panzarino; Md Mosharrof Hossein Mondal; Matthew DeCruz; Adam Siepel; Eric C. Lai
To assess miRNA evolution across the Drosophila genus, we analyzed several billion small RNA reads across 12 fruit fly species. These data permit comprehensive curation of species- and clade-specific variation in miRNA identity, abundance, and processing. Among well-conserved miRNAs, we observed unexpected cases of clade-specific variation in 5′ end precision, occasional antisense loci, and putatively noncanonical loci. We also used strict criteria to identify a large set (649) of novel, evolutionarily restricted miRNAs. Within the bulk collection of species-restricted miRNAs, two notable subpopulations are splicing-derived mirtrons and testes-restricted, recently evolved, clustered (TRC) canonical miRNAs. We quantified miRNA birth and death using our annotation and a phylogenetic model for estimating rates of miRNA turnover. We observed striking differences in birth and death rates across miRNA classes defined by biogenesis pathway, genomic clustering, and tissue restriction, and even identified flux heterogeneity among Drosophila clades. In particular, distinct molecular rationales underlie the distinct evolutionary behavior of different miRNA classes. Mirtrons are associated with high rates of 3′ untemplated addition, a mechanism that impedes their biogenesis, whereas TRC miRNAs appear to evolve under positive selection. Altogether, these data reveal miRNA diversity among Drosophila species and principles underlying their emergence and evolution.
DNA mismatch repair preferentially protects genes from mutation Genome Res. (IF 11.922) Pub Date : 2018-01-01 Eric J. Belfield; Zhong Jie Ding; Fiona J.C. Jamieson; Anne M. Visscher; Shao Jian Zheng; Aziz Mithani; Nicholas P. Harberd
Mutation is the source of genetic variation and fuels biological evolution. Many mutations first arise as DNA replication errors. These errors subsequently evade correction by cellular DNA repair, for example, by the well-known DNA mismatch repair (MMR) mechanism. Here, we determine the genome-wide effects of MMR on mutation. We first identify almost 9000 mutations accumulated over five generations in eight MMR-deficient mutation accumulation (MA) lines of the model plant species, Arabidopsis thaliana. We then show that MMR deficiency greatly increases the frequency of both smaller-scale insertions and deletions (indels) and of single-nucleotide variant (SNV) mutations. Most indels involve A or T nucleotides and occur preferentially in homopolymeric (poly A or poly T) genomic stretches. In addition, we find that the likelihood of occurrence of indels in homopolymeric stretches is strongly related to stretch length, and that this relationship causes ultrahigh localized mutation rates in specific homopolymeric stretch regions. For SNVs, we show that MMR deficiency both increases their frequency and changes their molecular mutational spectrum, causing further enhancement of the GC to AT bias characteristic of organisms with normal MMR function. Our final genome-wide analyses show that MMR deficiency disproportionately increases the numbers of SNVs in genes, rather than in nongenic regions of the genome. This latter observation indicates that MMR preferentially protects genes from mutation and has important consequences for understanding the evolution of genomes during both natural selection and human tumor growth.
SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells Genome Res. (IF 11.922) Pub Date : 2018-01-01 Kyung Yeon Han; Kyu-Tae Kim; Je-Gun Joung; Dae-Soon Son; Yeon Jeong Kim; Areum Jo; Hyo-Jeong Jeon; Hui-Sung Moon; Chang Eun Yoo; Woosung Chung; Hye Hyeon Eum; Sangmin Kim; Hong Kwan Kim; Jeong Eon Lee; Myung-Ju Ahn; Hae-Ock Lee; Donghyun Park; Woong-Yang Park
Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level.
Genome-wide DNA methylation profiling using the methylation-dependent restriction enzyme LpnPI Genome Res. (IF 11.922) Pub Date : 2018-01-01 Ruben Boers; Joachim Boers; Bas de Hoon; Christel Kockx; Zeliha Ozgur; Anco Molijn; Wilfred van IJcken; Joop Laven; Joost Gribnau
DNA methylation is a well-known epigenetic modification that plays a crucial role in gene regulation, but genome-wide analysis of DNA methylation remains technically challenging and costly. DNA methylation-dependent restriction enzymes can be used to restrict CpG methylation analysis to methylated regions of the genome only, which significantly reduces the required sequencing depth and simplifies subsequent bioinformatics analysis. Unfortunately, this approach has been hampered by complete digestion of DNA in CpG methylation-dense regions, resulting in fragments that are too small for accurate mapping. Here, we show that the activity of DNA methylation-dependent enzyme, LpnPI, is blocked by a fragment size smaller than 32 bp. This unique property prevents complete digestion of methylation-dense DNA and allows accurate genome-wide analysis of CpG methylation at single-nucleotide resolution. Methylated DNA sequencing (MeD-seq) of LpnPI digested fragments revealed highly reproducible genome-wide CpG methylation profiles for >50% of all potentially methylated CpGs, at a sequencing depth less than one-tenth required for whole-genome bisulfite sequencing (WGBS). MeD-seq identified a high number of patient and tissue-specific differential methylated regions (DMRs) and revealed that patient-specific DMRs observed in both blood and buccal samples predict DNA methylation in other tissues and organs. We also observed highly variable DNA methylation at gene promoters on the inactive X Chromosome, indicating tissue-specific and interpatient-specific escape of X Chromosome inactivation. These findings highlight the potential of MeD-seq for high-throughput epigenetic profiling.
ABCA4 midigenes reveal the full splice spectrum of all reported noncanonical splice site variants in Stargardt disease Genome Res. (IF 11.922) Pub Date : 2018-01-01 Riccardo Sangermano; Mubeen Khan; Stéphanie S. Cornelis; Valerie Richelle; Silvia Albert; Alejandro Garanto; Duaa Elmelik; Raheel Qamar; Dorien Lugtenberg; L. Ingeborgh van den Born; Rob W.J. Collin; Frans P.M. Cremers
Stargardt disease is caused by variants in the ABCA4 gene, a significant part of which are noncanonical splice site (NCSS) variants. In case a gene of interest is not expressed in available somatic cells, small genomic fragments carrying potential disease-associated variants are tested for splice abnormalities using in vitro splice assays. We recently discovered that when using small minigenes lacking the proper genomic context, in vitro results do not correlate with splice defects observed in patient cells. We therefore devised a novel strategy in which a bacterial artificial chromosome was employed to generate midigenes, splice vectors of varying lengths (up to 11.7 kb) covering almost the entire ABCA4 gene. These midigenes were used to analyze the effect of all 44 reported and three novel NCSS variants on ABCA4 pre-mRNA splicing. Intriguingly, multi-exon skipping events were observed, as well as exon elongation and intron retention. The analysis of all reported NCSS variants in ABCA4 allowed us to reveal the nature of aberrant splicing events and to classify the severity of these mutations based on the residual fraction of wild-type mRNA. Our strategy to generate large overlapping splice vectors carrying multiple exons, creating a toolbox for robust and high-throughput analysis of splice variants, can be applied to all human genes.
SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site Genome Res. (IF 11.922) Pub Date : 2018-01-01 Liyang Zhang; Gabriella D. Martini; H. Tomas Rube; Judith F. Kribelbauer; Chaitanya Rastogi; Vincent D. FitzPatrick; Jon C. Houtman; Harmen J. Bussemaker; Miles A. Pufall
The DNA-binding interfaces of the androgen (AR) and glucocorticoid (GR) receptors are virtually identical, yet these transcription factors share only about a third of their genomic binding sites and regulate similarly distinct sets of target genes. To address this paradox, we determined the intrinsic specificities of the AR and GR DNA-binding domains using a refined version of SELEX-seq. We developed an algorithm, SelexGLM, that quantifies binding specificity over a large (31-bp) binding site by iteratively fitting a feature-based generalized linear model to SELEX probe counts. This analysis revealed that the DNA-binding preferences of AR and GR homodimers differ significantly, both within and outside the 15-bp core binding site. The relative preference between the two factors can be tuned over a wide range by changing the DNA sequence, with AR more sensitive to sequence changes than GR. The specificity of AR extends to the regions flanking the core 15-bp site, where isothermal calorimetry measurements reveal that affinity is augmented by enthalpy-driven readout of poly(A) sequences associated with narrowed minor groove width. We conclude that the increased specificity of AR is correlated with more enthalpy-driven binding than GR. The binding models help explain differences in AR and GR genomic binding and provide a biophysical rationale for how promiscuous binding by GR allows functional substitution for AR in some castration-resistant prostate cancers.
Impact of regulatory variation across human iPSCs and differentiated cells Genome Res. (IF 11.922) Pub Date : 2018-01-01 Nicholas E. Banovich; Yang I. Li; Anil Raj; Michelle C. Ward; Peyton Greenside; Diego Calderon; Po Yuan Tung; Jonathan E. Burnett; Marsha Myrthil; Samantha M. Thomas; Courtney K. Burrows; Irene Gallego Romero; Bryan J. Pavlovic; Anshul Kundaje; Jonathan K. Pritchard; Yoav Gilad
Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type–specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type–specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type–specific chromatin accessibility.
The landscape of miRNA editing in animals and its impact on miRNA biogenesis and targeting Genome Res. (IF 11.922) Pub Date : 2018-01-01 Lishi Li; Yulong Song; Xinrui Shi; Jianheng Liu; Shaolei Xiong; Wanying Chen; Qiang Fu; Zichao Huang; Nannan Gu; Rui Zhang
Adenosine-to-inosine (A-to-I) RNA editing regulates miRNA biogenesis and function. To date, fewer than 160 miRNA editing sites have been identified. Here, we present a quantitative atlas of miRNA A-to-I editing through the profiling of 201 pri-miRNA samples and 4694 mature miRNA samples in human, mouse, and Drosophila. We identified 4162 sites present in ∼80% of the pri-miRNAs and 574 sites in mature miRNAs. miRNA editing is prevalent in many tissue types in human. However, high-level editing is mostly found in neuronal tissues in mouse and Drosophila. Interestingly, the edited miRNAs in neuronal and non-neuronal tissues in human gain two distinct sets of new targets, which are significantly associated with cognitive and organ developmental functions, respectively. Furthermore, we reveal that miRNA editing profoundly affects asymmetric strand selection. Altogether, these data provide insight into the impact of RNA editing on miRNA biology and suggest that miRNA editing has recently gained non-neuronal functions in human.
Corrigendum: Discovery and genotyping of structural variation from long-read haploid genome sequence data Genome Res. (IF 11.922) Pub Date : 2018-01-01 John Huddleston; Mark J.P. Chaisson; Karyn Meltz Steinberg; Wes Warren; Kendra Hoekzema; David Gordon; Tina A. Graves-Lindsay; Katherine M. Munson; Zev N. Kronenberg; Laura Vives; Paul Peluso; Matthew Boitano; Chen-Shin Chin; Jonas Korlach; Richard K. Wilson; Evan E. Eichler
Genome Research 27: 677–685 (2017)
Sex-biased microRNA expression in mammals and birds reveals underlying regulatory mechanisms and a role in dosage compensation Genome Res. (IF 11.922) Pub Date : 2017-12-01 Maria Warnefors; Katharina Mössinger; Jean Halbert; Tania Studer; John L. VandeBerg; Isa Lindgren; Amir Fallahshahroudi; Per Jensen; Henrik Kaessmann
Sexual dimorphism depends on sex-biased gene expression, but the contributions of microRNAs (miRNAs) have not been globally assessed. We therefore produced an extensive small RNA sequencing data set to analyze male and female miRNA expression profiles in mouse, opossum, and chicken. Our analyses uncovered numerous cases of somatic sex-biased miRNA expression, with the largest proportion found in the mouse heart and liver. Sex-biased expression is explained by miRNA-specific regulation, including sex-biased chromatin accessibility at promoters, rather than piggybacking of intronic miRNAs on sex-biased protein-coding genes. In mouse, but not opossum and chicken, sex bias is coordinated across tissues such that autosomal testis-biased miRNAs tend to be somatically male-biased, whereas autosomal ovary-biased miRNAs are female-biased, possibly due to broad hormonal control. In chicken, which has a Z/W sex chromosome system, expression output of genes on the Z Chromosome is expected to be male-biased, since there is no global dosage compensation mechanism that restores expression in ZW females after almost all genes on the W Chromosome decayed. Nevertheless, we found that the dominant liver miRNA, miR-122-5p, is Z-linked but expressed in an unbiased manner, due to the unusual retention of a W-linked copy. Another Z-linked miRNA, the male-biased miR-2954-3p, shows conserved preference for dosage-sensitive genes on the Z Chromosome, based on computational and experimental data from chicken and zebra finch, and acts to equalize male-to-female expression ratios of its targets. Unexpectedly, our findings thus establish miRNA regulation as a novel gene-specific dosage compensation mechanism.
Convergent origination of a Drosophila-like dosage compensation mechanism in a reptile lineage Genome Res. (IF 11.922) Pub Date : 2017-12-01 Ray Marin; Diego Cortez; Francesco Lamanna; Madapura M. Pradeepa; Evgeny Leushkin; Philippe Julien; Angélica Liechti; Jean Halbert; Thoomke Brüning; Katharina Mössinger; Timo Trefzer; Christian Conrad; Halie N. Kerver; Juli Wade; Patrick Tschopp; Henrik Kaessmann
Sex chromosomes differentiated from different ancestral autosomes in various vertebrate lineages. Here, we trace the functional evolution of the XY Chromosomes of the green anole lizard (Anolis carolinensis), on the basis of extensive high-throughput genome, transcriptome and histone modification sequencing data and revisit dosage compensation evolution in representative mammals and birds with substantial new expression data. Our analyses show that Anolis sex chromosomes represent an ancient XY system that originated at least ≈160 million years ago in the ancestor of Iguania lizards, shortly after the separation from the snake lineage. The age of this system approximately coincides with the ages of the avian and two mammalian sex chromosomes systems. To compensate for the almost complete Y Chromosome degeneration, X-linked genes have become twofold up-regulated, restoring ancestral expression levels. The highly efficient dosage compensation mechanism of Anolis represents the only vertebrate case identified so far to fully support Ohno's original dosage compensation hypothesis. Further analyses reveal that X up-regulation occurs only in males and is mediated by a male-specific chromatin machinery that leads to global hyperacetylation of histone H4 at lysine 16 specifically on the X Chromosome. The green anole dosage compensation mechanism is highly reminiscent of that of the fruit fly, Drosophila melanogaster. Altogether, our work unveils the convergent emergence of a Drosophila-like dosage compensation mechanism in an ancient reptilian sex chromosome system and highlights that the evolutionary pressures imposed by sex chromosome dosage reductions in different amniotes were resolved in fundamentally different ways.
Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations Genome Res. (IF 11.922) Pub Date : 2017-12-01 Zoe June Assaf; Susanne Tilk; Jane Park; Mark L. Siegal; Dmitri A. Petrov
Mutations provide the raw material of evolution, and thus our ability to study evolution depends fundamentally on having precise measurements of mutational rates and patterns. We generate a data set for this purpose using (1) de novo mutations from mutation accumulation experiments and (2) extremely rare polymorphisms from natural populations. The first, mutation accumulation (MA) lines are the product of maintaining flies in tiny populations for many generations, therefore rendering natural selection ineffective and allowing new mutations to accrue in the genome. The second, rare genetic variation from natural populations allows the study of mutation because extremely rare polymorphisms are relatively unaffected by the filter of natural selection. We use both methods in Drosophila melanogaster, first generating our own novel data set of sequenced MA lines and performing a meta-analysis of all published MA mutations (∼2000 events) and then identifying a high quality set of ∼70,000 extremely rare (≤0.1%) polymorphisms that are fully validated with resequencing. We use these data sets to precisely measure mutational rates and patterns. Highlights of our results include: a high rate of multinucleotide mutation events at both short (∼5 bp) and long (∼1 kb) genomic distances, showing that mutation drives GC content lower in already GC-poor regions, and using our precise context-dependent mutation rates to predict long-term evolutionary patterns at synonymous sites. We also show that de novo mutations from independent MA experiments display similar patterns of single nucleotide mutation and well match the patterns of mutation found in natural populations.
Comparative genome analysis of programmed DNA elimination in nematodes Genome Res. (IF 11.922) Pub Date : 2017-12-01 Jianbin Wang; Shenghan Gao; Yulia Mostovoy; Yuanyuan Kang; Maxim Zagoskin; Yongqiao Sun; Bing Zhang; Laura K. White; Alice Easton; Thomas B. Nutman; Pui-Yan Kwok; Songnian Hu; Martin K. Nielsen; Richard E. Davis
Programmed DNA elimination is a developmentally regulated process leading to the reproducible loss of specific genomic sequences. DNA elimination occurs in unicellular ciliates and a variety of metazoans, including invertebrates and vertebrates. In metazoa, DNA elimination typically occurs in somatic cells during early development, leaving the germline genome intact. Reference genomes for metazoa that undergo DNA elimination are not available. Here, we generated germline and somatic reference genome sequences of the DNA eliminating pig parasitic nematode Ascaris suum and the horse parasite Parascaris univalens. In addition, we carried out in-depth analyses of DNA elimination in the parasitic nematode of humans, Ascaris lumbricoides, and the parasitic nematode of dogs, Toxocara canis. Our analysis of nematode DNA elimination reveals that in all species, repetitive sequences (that differ among the genera) and germline-expressed genes (approximately 1000–2000 or 5%–10% of the genes) are eliminated. Thirty-five percent of these eliminated genes are conserved among these nematodes, defining a core set of eliminated genes that are preferentially expressed during spermatogenesis. Our analysis supports the view that DNA elimination in nematodes silences germline-expressed genes. Over half of the chromosome break sites are conserved between Ascaris and Parascaris, whereas only 10% are conserved in the more divergent T. canis. Analysis of the chromosomal breakage regions suggests a sequence-independent mechanism for DNA breakage followed by telomere healing, with the formation of more accessible chromatin in the break regions prior to DNA elimination. Our genome assemblies and annotations also provide comprehensive resources for analysis of DNA elimination, parasitology research, and comparative nematode genome and epigenome studies.
Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences Genome Res. (IF 11.922) Pub Date : 2017-12-01 Josh T. Cuperus; Benjamin Groves; Anna Kuchina; Alexander B. Rosenberg; Nebojsa Jojic; Stanley Fields; Georg Seelig
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.
A novel approach for data integration and disease subtyping Genome Res. (IF 11.922) Pub Date : 2017-12-01 Tin Nguyen; Rebecca Tagett; Diana Diaz; Sorin Draghici
Advances in high-throughput technologies allow for measurements of many types of omics data, yet the meaningful integration of several different data types remains a significant challenge. Another important and difficult problem is the discovery of molecular disease subtypes characterized by relevant clinical differences, such as survival. Here we present a novel approach, called perturbation clustering for data integration and disease subtyping (PINS), which is able to address both challenges. The framework has been validated on thousands of cancer samples, using gene expression, DNA methylation, noncoding microRNA, and copy number variation data available from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. This simultaneous subtyping approach accurately identifies known cancer subtypes and novel subgroups of patients with significantly different survival profiles. The results were obtained from genome-scale molecular data without any other type of prior knowledge. The approach is sufficiently general to replace existing unsupervised clustering approaches outside the scope of bio-medical research, with the additional ability to integrate multiple types of data.
Rapid molecular assays to study human centromere genomics Genome Res. (IF 11.922) Pub Date : 2017-12-01 Rafael Contreras-Galindo; Sabrina Fischer; Anjan K. Saha; John D. Lundy; Patrick W. Cervantes; Mohamad Mourad; Claire Wang; Brian Qian; Manhong Dai; Fan Meng; Arul Chinnaiyan; Gilbert S. Omenn; Mark H. Kaplan; David M. Markovitz
The centromere is the structural unit responsible for the faithful segregation of chromosomes. Although regulation of centromeric function by epigenetic factors has been well-studied, the contributions of the underlying DNA sequences have been much less well defined, and existing methodologies for studying centromere genomics in biology are laborious. We have identified specific markers in the centromere of 23 of the 24 human chromosomes that allow for rapid PCR assays capable of capturing the genomic landscape of human centromeres at a given time. Use of this genetic strategy can also delineate which specific centromere arrays in each chromosome drive the recruitment of epigenetic modulators. We further show that, surprisingly, loss and rearrangement of DNA in centromere 21 is associated with trisomy 21. This new approach can thus be used to rapidly take a snapshot of the genetics and epigenetics of each specific human centromere in nondisjunction disorders and other biological settings.
GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly Genome Res. (IF 11.922) Pub Date : 2017-12-01 Daniel L. Cameron; Jan Schröder; Jocelyn Sietsma Penington; Hongdo Do; Ramyar Molania; Alexander Dobrovic; Terence P. Speed; Anthony T. Papenfuss
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.
Cre-dependent Cas9-expressing pigs enable efficient in vivo genome editing Genome Res. (IF 11.922) Pub Date : 2017-12-01 Kepin Wang; Qin Jin; Degong Ruan; Yi Yang; Qishuai Liu; Han Wu; Zhiwei Zhou; Zhen Ouyang; Zhaoming Liu; Yu Zhao; Bentian Zhao; Quanjun Zhang; Jiangyun Peng; Chengdan Lai; Nana Fan; Yanhui Liang; Ting Lan; Nan Li; Xiaoshan Wang; Xinlu Wang; Yong Fan; Pieter A. Doevendans; Joost P.G. Sluijter; Pentao Liu; Xiaoping Li; Liangxue Lai
Despite being time-consuming and costly, generating genome-edited pigs holds great promise for agricultural, biomedical, and pharmaceutical applications. To further facilitate genome editing in pigs, we report here establishment of a pig line with Cre-inducible Cas9 expression that allows a variety of ex vivo genome editing in fibroblast cells including single- and multigene modifications, chromosome rearrangements, and efficient in vivo genetic modifications. As a proof of principle, we were able to simultaneously inactivate five tumor suppressor genes (TP53, PTEN, APC, BRCA1, and BRCA2) and activate one oncogene (KRAS), achieved by delivering Cre recombinase and sgRNAs, which caused rapid lung tumor development. The efficient genome editing shown here demonstrates that these pigs can serve as a powerful tool for dissecting in vivo gene functions and biological processes in a temporal manner and for streamlining the production of genome-edited pigs for disease modeling.
Nanopore sequencing of complex genomic rearrangements in yeast reveals mechanisms of repeat-mediated double-strand break repair Genome Res. (IF 11.922) Pub Date : 2017-12-01 Ryan J. McGinty; Rachel G. Rubinstein; Alexander J. Neil; Margaret Dominska; Denis Kiktev; Thomas D. Petes; Sergei M. Mirkin
Improper DNA double-strand break (DSB) repair results in complex genomic rearrangements (CGRs) in many cancers and various congenital disorders in humans. Trinucleotide repeat sequences, such as (GAA)n repeats in Friedreich's ataxia, (CTG)n repeats in myotonic dystrophy, and (CGG)n repeats in fragile X syndrome, are also subject to double-strand breaks within the repetitive tract followed by DNA repair. Mapping the outcomes of CGRs is important for understanding their causes and potential phenotypic effects. However, high-resolution mapping of CGRs has traditionally been a laborious and highly skilled process. Recent advances in long-read DNA sequencing technologies, specifically Nanopore sequencing, have made possible the rapid identification of CGRs with single base pair resolution. Here, we have used whole-genome Nanopore sequencing to characterize several CGRs that originated from naturally occurring DSBs at (GAA)n microsatellites in Saccharomyces cerevisiae. These data gave us important insights into the mechanisms of DSB repair leading to CGRs.
An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics Genome Res. (IF 11.922) Pub Date : 2017-12-01 Ulrich Omasits; Adithi R. Varadarajan; Michael Schmid; Sandra Goetze; Damianos Melidis; Marc Bourqui; Olga Nikolayeva; Maxime Québatte; Andrea Patrignani; Christoph Dehio; Juerg E. Frey; Mark D. Robinson; Bernd Wollscheid; Christian H. Ahrens
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Chromatin accessibility dynamics reveal novel functional enhancers in C. elegans Genome Res. (IF 11.922) Pub Date : 2017-12-01 Aaron C. Daugherty; Robin W. Yeo; Jason D. Buenrostro; William J. Greenleaf; Anshul Kundaje; Anne Brunet
Chromatin accessibility, a crucial component of genome regulation, has primarily been studied in homogeneous and simple systems, such as isolated cell populations or early-development models. Whether chromatin accessibility can be assessed in complex, dynamic systems in vivo with high sensitivity remains largely unexplored. In this study, we use ATAC-seq to identify chromatin accessibility changes in a whole animal, the model organism Caenorhabditis elegans, from embryogenesis to adulthood. Chromatin accessibility changes between developmental stages are highly reproducible, recapitulate histone modification changes, and reveal key regulatory aspects of the epigenomic landscape throughout organismal development. We find that over 5000 distal noncoding regions exhibit dynamic changes in chromatin accessibility between developmental stages and could thereby represent putative enhancers. When tested in vivo, several of these putative enhancers indeed drive novel cell-type- and temporal-specific patterns of expression. Finally, by integrating transcription factor binding motifs in a machine learning framework, we identify EOR-1 as a unique transcription factor that may regulate chromatin dynamics during development. Our study provides a unique resource for C. elegans, a system in which the prevalence and importance of enhancers remains poorly characterized, and demonstrates the power of using whole organism chromatin accessibility to identify novel regulatory regions in complex systems.
Genome-wide discovery of active regulatory elements and transcription factor footprints in Caenorhabditis elegans using DNase-seq Genome Res. (IF 11.922) Pub Date : 2017-12-01 Margaret C.W. Ho; Porfirio Quintero-Cadena; Paul W. Sternberg
Deep sequencing of size-selected DNase I–treated chromatin (DNase-seq) allows high-resolution measurement of chromatin accessibility to DNase I cleavage, permitting identification of de novo active cis-regulatory modules (CRMs) and individual transcription factor (TF) binding sites. We adapted DNase-seq to nuclei isolated from C. elegans embryos and L1 arrest larvae to generate high-resolution maps of TF binding. Over half of embryonic DNase I hypersensitive sites (DHSs) were annotated as noncoding, with 24% in intergenic, 12% in promoters, and 28% in introns, with similar statistics observed in L1 arrest larvae. Noncoding DHSs are highly conserved and enriched in marks of enhancer activity and transcription. We validated noncoding DHSs against known enhancers from myo-2, myo-3, hlh-1, elt-2, and lin-26/lir-1 and recapitulated 15 of 17 known enhancers. We then mined DNase-seq data to identify putative active CRMs and TF footprints. Using DNase-seq data improved predictions of tissue-specific expression compared with motifs alone. In a pilot functional test, 10 of 15 DHSs from pha-4, icl-1, and ceh-13 drove reporter gene expression in transgenic C. elegans. Overall, we provide experimental annotation of 26,644 putative CRMs in the embryo containing 55,890 TF footprints, as well as 15,841 putative CRMs in the L1 arrest larvae containing 32,685 TF footprints.
Some contents have been Reproduced by permission of The Royal Society of Chemistry.
- Acc. Chem. Res.
- ACS Appl. Mater. Interfaces
- ACS Biomater. Sci. Eng.
- ACS Catal.
- ACS Cent. Sci.
- ACS Chem. Biol.
- ACS Chem. Neurosci.
- ACS Comb. Sci.
- ACS Earth Space Chem.
- ACS Energy Lett.
- ACS Infect. Dis.
- ACS Macro Lett.
- ACS Med. Chem. Lett.
- ACS Nano
- ACS Omega
- ACS Photonics
- ACS Sens.
- ACS Sustainable Chem. Eng.
- ACS Synth. Biol.
- Acta Biomater.
- Acta Crystallogr. A Found. Adv.
- Acta Mater.
- Adv. Colloid Interface Sci.
- Adv. Electron. Mater.
- Adv. Energy Mater.
- Adv. Funct. Mater.
- Adv. Healthcare Mater.
- Adv. Mater.
- Adv. Mater. Interfaces
- Adv. Opt. Mater.
- Adv. Sci.
- Adv. Synth. Catal.
- AlChE J.
- Anal. Bioanal. Chem.
- Anal. Chem.
- Anal. Chim. Acta
- Anal. Methods
- Angew. Chem. Int. Ed.
- Annu. Rev. Anal. Chem.
- Annu. Rev. Biochem.
- Annu. Rev. Environ. Resour.
- Annu. Rev. Food Sci. Technol.
- Annu. Rev. Mater. Res.
- Annu. Rev. Phys. Chem.
- Appl. Catal. A Gen.
- Appl. Catal. B Environ.
- Appl. Clay. Sci.
- Appl. Energy
- Aquat. Toxicol.
- Arab. J. Chem.
- Asian J. Org. Chem.
- Atmos. Environ.
- Carbohydr. Polym.
- Catal. Commun.
- Catal. Rev. Sci. Eng.
- Catal. Sci. Technol.
- Catal. Today
- Cell Chem. Bio.
- Cem. Concr. Res.
- Ceram. Int.
- Chem. Asian J.
- Chem. Bio. Drug Des.
- Chem. Biol. Interact.
- Chem. Commun.
- Chem. Educ. Res. Pract.
- Chem. Eng. J.
- Chem. Eng. Sci.
- Chem. Eur. J.
- Chem. Mater.
- Chem. Phys.
- Chem. Phys. Lett.
- Chem. Phys. Lipids
- Chem. Rev.
- Chem. Sci.
- Chem. Soc. Rev.
- Chin. J. Chem.
- Combust. Flame
- Compos. Part A Appl. Sci. Manuf.
- Compos. Sci. Technol.
- Compr. Rev. Food Sci. Food Saf.
- Comput. Chem. Eng.
- Constr. Build. Mater.
- Coordin. Chem. Rev.
- Corros. Sci.
- Crit. Rev. Food Sci. Nutr.
- Crit. Rev. Solid State Mater. Sci.
- Cryst. Growth Des.
- Curr. Opin. Chem. Eng.
- Curr. Opin. Colloid Interface Sci.
- Curr. Opin. Environ. Sustain
- Curr. Opin. Solid State Mater. Sci.
- Ecotox. Environ. Safe.
- Electrochem. Commun.
- Electrochim. Acta
- Energy Environ. Sci.
- Energy Fuels
- Energy Storage Mater.
- Environ. Impact Assess. Rev.
- Environ. Int.
- Environ. Model. Softw.
- Environ. Pollut.
- Environ. Res.
- Environ. Sci. Policy
- Environ. Sci. Technol.
- Environ. Sci. Technol. Lett.
- Environ. Sci.: Nano
- Environ. Sci.: Processes Impacts
- Environ. Sci.: Water Res. Technol.
- Eur. J. Inorg. Chem.
- Eur. J. Med. Chem.
- Eur. J. Org. Chem.
- Eur. Polym. J.
- J. Acad. Nutr. Diet.
- J. Agric. Food Chem.
- J. Alloys Compd.
- J. Am. Ceram. Soc.
- J. Am. Chem. Soc.
- J. Am. Soc. Mass Spectrom.
- J. Anal. Appl. Pyrol.
- J. Anal. At. Spectrom.
- J. Antibiot.
- J. Catal.
- J. Chem. Educ.
- J. Chem. Eng. Data
- J. Chem. Inf. Model.
- J. Chem. Phys.
- J. Chem. Theory Comput.
- J. Chromatogr. A
- J. Chromatogr. B
- J. Clean. Prod.
- J. CO2 UTIL.
- J. Colloid Interface Sci.
- J. Comput. Chem.
- J. Cryst. Growth
- J. Dairy Sci.
- J. Electroanal. Chem.
- J. Electrochem. Soc.
- J. Environ. Manage.
- J. Eur. Ceram. Soc.
- J. Fluorine Chem.
- J. Food Drug Anal.
- J. Food Eng.
- J. Food Sci.
- J. Funct. Foods
- J. Hazard. Mater.
- J. Heterocycl. Chem.
- J. Hydrol.
- J. Ind. Eng. Chem.
- J. Inorg. Biochem.
- J. Magn. Magn. Mater.
- J. Mater. Chem. A
- J. Mater. Chem. B
- J. Mater. Chem. C
- J. Mater. Process. Tech.
- J. Mech. Behav. Biomed. Mater.
- J. Med. Chem.
- J. Membr. Sci.
- J. Mol. Catal. A Chem.
- J. Mol. Liq.
- J. Nat. Gas Sci. Eng.
- J. Nat. Prod.
- J. Nucl. Mater.
- J. Org. Chem.
- J. Photochem. Photobiol. C Photochem. Rev.
- J. Phys. Chem. A
- J. Phys. Chem. B
- J. Phys. Chem. C
- J. Phys. Chem. Lett.
- J. Polym. Sci. A Polym. Chem.
- J. Porphyr. Phthalocyanines
- J. Power Sources
- J. Solid State Chem.
- J. Taiwan Inst. Chem. E.
- Macromol. Rapid Commun.
- Mass Spectrom. Rev.
- Mater. Chem. Front.
- Mater. Des.
- Mater. Horiz.
- Mater. Lett.
- Mater. Sci. Eng. A
- Mater. Sci. Eng. R Rep.
- Mater. Today
- Meat Sci.
- Med. Chem. Commun.
- Microchem. J.
- Microchim. Acta
- Micropor. Mesopor. Mater.
- Mol. Biosyst.
- Mol. Cancer Ther.
- Mol. Catal.
- Mol. Nutr. Food Res.
- Mol. Pharmaceutics
- Mol. Syst. Des. Eng.
- Nano Energy
- Nano Lett.
- Nano Res.
- Nano Today
- Nano-Micro Lett.
- Nanomed. Nanotech. Biol. Med.
- Nanoscale Horiz.
- Nat. Catal.
- Nat. Chem.
- Nat. Chem. Biol.
- Nat. Commun.
- Nat. Energy
- Nat. Mater.
- Nat. Med.
- Nat. Methods
- Nat. Nanotech.
- Nat. Photon.
- Nat. Prod. Rep.
- Nat. Protoc.
- Nat. Rev. Chem.
- Nat. Rev. Drug. Disc.
- Nat. Rev. Mater.
- Natl. Sci. Rev.
- Neurochem. Int.
- New J. Chem.
- NPG Asia Mater.
- npj 2D Mater. Appl.
- npj Comput. Mater.
- npj Flex. Electron.
- npj Mater. Degrad.
- npj Sci. Food
- Pharmacol. Rev.
- Pharmacol. Therapeut.
- Photochem. Photobiol. Sci.
- Phys. Chem. Chem. Phys.
- Phys. Life Rev.
- PLOS ONE
- Polym. Chem.
- Polym. Degrad. Stabil.
- Polym. J.
- Polym. Rev.
- Powder Technol.
- Proc. Combust. Inst.
- Prog. Cryst. Growth Ch. Mater.
- Prog. Energy Combust. Sci.
- Prog. Mater. Sci.
- Prog. Photovoltaics
- Prog. Polym. Sci.
- Prog. Solid State Chem.