当前期刊: Genome Biology Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • A benchmark of batch-effect correction methods for single-cell RNA sequencing data
    Genome Biol. (IF 14.028) Pub Date : 2020-01-16
    Hoa Thi Nhu Tran; Kok Siong Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Yee Shin Lee; Michelle Goh; Jinmiao Chen

    Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

    更新日期:2020-01-16
  • DENDRO: genetic heterogeneity profiling and subclone detection by single-cell RNA sequencing
    Genome Biol. (IF 14.028) Pub Date : 2020-01-14
    Zilu Zhou; Bihui Xu; Andy Minn; Nancy R. Zhang

    Although scRNA-seq is now ubiquitously adopted in studies of intratumor heterogeneity, detection of somatic mutations and inference of clonal membership from scRNA-seq is currently unreliable. We propose DENDRO, an analysis method for scRNA-seq data that clusters single cells into genetically distinct subclones and reconstructs the phylogenetic tree relating the subclones. DENDRO utilizes transcribed point mutations and accounts for technical noise and expression stochasticity. We benchmark DENDRO and demonstrate its application on simulation data and real data from three cancer types. In particular, on a mouse melanoma model in response to immunotherapy, DENDRO delineates the role of neoantigens in treatment response.

    更新日期:2020-01-14
  • HIFI: estimating DNA-DNA interaction frequency from Hi-C data at restriction-fragment resolution
    Genome Biol. (IF 14.028) Pub Date : 2020-01-14
    Christopher JF Cameron; Josée Dostie; Mathieu Blanchette

    Hi-C is a popular technique to map three-dimensional chromosome conformation. In principle, Hi-C’s resolution is only limited by the size of restriction fragments. However, insufficient sequencing depth forces researchers to artificially reduce the resolution of Hi-C matrices at a loss of biological interpretability. We present the Hi-C Interaction Frequency Inference (HIFI) algorithms that accurately estimate restriction-fragment resolution Hi-C matrices by exploiting dependencies between neighboring fragments. Cross-validation experiments and comparisons to 5C data and known regulatory interactions demonstrate HIFI’s superiority to existing approaches. In addition, HIFI’s restriction-fragment resolution reveals a new role for active regulatory regions in structuring topologically associating domains.

    更新日期:2020-01-14
  • Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains
    Genome Biol. (IF 14.028) Pub Date : 2020-01-07
    Elissavet Kentepozidou; Sarah J. Aitken; Christine Feig; Klara Stefflova; Ximena Ibarra-Soria; Duncan T. Odom; Maša Roller; Paul Flicek

    CTCF binding contributes to the establishment of a higher-order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). However, despite the importance and conservation of TADs, the role of CTCF binding in their evolution and stability remains elusive. We carry out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We perform CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associate them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by a balance of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor dynamically evolving clusters containing both evolutionarily old and young CTCF sites as a result of the repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. Dynamic conservation of CTCF site clusters is an apparently important feature of CTCF binding evolution that is critical to the functional stability of a higher-order chromatin structure.

    更新日期:2020-01-07
  • Non-coding RNAs underlie genetic predisposition to breast cancer
    Genome Biol. (IF 14.028) Pub Date : 2020-01-07
    Mahdi Moradi Marjaneh; Jonathan Beesley; Tracy A. O’Mara; Pamela Mukhopadhyay; Lambros T. Koufariotis; Stephen Kazakoff; Nehal Hussein; Laura Fachal; Nenad Bartonicek; Kristine M. Hillman; Susanne Kaufmann; Haran Sivakumaran; Chanel E. Smart; Amy E. McCart Reed; Kaltin Ferguson; Jodi M. Saunus; Sunil R. Lakhani; Daniel R. Barnes; Antonis C. Antoniou; Marcel E. Dinger; Nicola Waddell; Douglas F. Easton; Alison M. Dunning; Georgia Chenevix-Trench; Stacey L. Edwards; Juliet D. French

    Genetic variants identified through genome-wide association studies (GWAS) are predominantly non-coding and typically attributed to altered regulatory elements such as enhancers and promoters. However, the contribution of non-coding RNAs to complex traits is not clear. Using targeted RNA sequencing, we systematically annotated multi-exonic non-coding RNA (mencRNA) genes transcribed from 1.5-Mb intervals surrounding 139 breast cancer GWAS signals and assessed their contribution to breast cancer risk. We identify more than 4000 mencRNA genes and show their expression distinguishes normal breast tissue from tumors and different breast cancer subtypes. Importantly, breast cancer risk variants, identified through genetic fine-mapping, are significantly enriched in mencRNA exons, but not the promoters or introns. eQTL analyses identify mencRNAs whose expression is associated with risk variants. Furthermore, chromatin interaction data identify hundreds of mencRNA promoters that loop to regions that contain breast cancer risk variants. We have compiled the largest catalog of breast cancer-associated mencRNAs to date and provide evidence that modulation of mencRNAs by GWAS variants may provide an alternative mechanism underlying complex traits.

    更新日期:2020-01-07
  • Chromatin interactome mapping at 139 independent breast cancer risk signals
    Genome Biol. (IF 14.028) Pub Date : 2020-01-07
    Jonathan Beesley; Haran Sivakumaran; Mahdi Moradi Marjaneh; Luize G. Lima; Kristine M. Hillman; Susanne Kaufmann; Natasha Tuano; Nehal Hussein; Sunyoung Ham; Pamela Mukhopadhyay; Stephen Kazakoff; Jason S. Lee; Kyriaki Michailidou; Daniel R. Barnes; Antonis C. Antoniou; Laura Fachal; Alison M. Dunning; Douglas F. Easton; Nicola Waddell; Joseph Rosenbluh; Andreas Möller; Georgia Chenevix-Trench; Juliet D. French; Stacey L. Edwards

    Genome-wide association studies have identified 196 high confidence independent signals associated with breast cancer susceptibility. Variants within these signals frequently fall in distal regulatory DNA elements that control gene expression. We designed a Capture Hi-C array to enrich for chromatin interactions between the credible causal variants and target genes in six human mammary epithelial and breast cancer cell lines. We show that interacting regions are enriched for open chromatin, histone marks for active enhancers, and transcription factors relevant to breast biology. We exploit this comprehensive resource to identify candidate target genes at 139 independent breast cancer risk signals and explore the functional mechanism underlying altered risk at the 12q24 risk region. Our results demonstrate the power of combining genetics, computational genomics, and molecular studies to rationalize the identification of key variants and candidate target genes at breast cancer GWAS signals.

    更新日期:2020-01-07
  • Functional consequences of archaic introgression and their impact on fitness
    Genome Biol. (IF 14.028) Pub Date : 2020-01-02
    Maxime Rotival; Lluis Quintana-Murci

    Anatomically modern humans started to exit Africa for the first time at least 60,000 years ago (ya). Along their journey across the globe, they encountered and admixed with other hominins that are now extinct, such as the Neanderthals or Denisovans. Given the deep divergence time between ancient hominins and modern humans, such admixture events left molecular traces in non-African populations that are still visible today in their genomes [1]. Over the past few years, there is accumulating evidence to suggest that these segments of “archaic” DNA have the potential to contribute to phenotypic differences between contemporary individuals and populations [2]. Yet, to understand the genuine contribution of archaic alleles to the genetic architecture of complex traits, it is necessary to account for the diverse selective pressures that have acted upon introgressed alleles. Here, we discuss recent findings on how natural selection—either negative or positive—has shaped the landscape of Neanderthal ancestry in the genomes of modern Eurasians, and comment on the contribution of archaic haplotypes to present-day phenotypic variation. It has been suggested that the vast majority of alleles that Neanderthals contributed to modern humans were deleterious. The low genetic diversity of the available Neanderthal genomes indicates indeed that they had a limited effective population size, about 10-fold smaller than that of modern humans (Fig. 1a). Consequently, natural selection is expected to have been less efficient at removing deleterious mutations from the genome of Neanderthals than from the genome of modern humans [3]. Using forward simulations, Harris and Nielsen have shown that, prior to the admixture event(s), modern humans had higher fitness than Neanderthals, owing to a lower burden of deleterious alleles. Fig. 1 The fate of introgressed archaic haplotypes in the modern human genome. a Simplified demographic model of human populations. The size of the branches reflects effective population sizes (Ne), and a red arrow indicates Neanderthal introgression. Numbers indicate the relative position of the ancestral and present-day populations on the tree. b Haplotype structures and trajectory of archaic ancestry at three different regions that harbor distinct type of genetic variants (deleterious additive, deleterious recessive, beneficial). For ancestry trajectories, horizontal dotted line indicates the initial introgression frequency, green arrow represents the onset of selection for the beneficial allele. For haplotype structures, haplotypes are represented as columns. Neutral alleles are shown in blue, deleterious alleles in red (additive) or orange (recessive), and beneficial alleles in greenFull size image Assuming that the effect of deleterious mutations is mostly additive, they estimated that Neanderthal DNA was rapidly purged from the human genome after admixture, dropping from ~ 10 to the 2–3% currently observed in Eurasians [3] (Fig. 1b, upper panel). The purging was exacerbated in highly constrained regions, which exhibit decreased levels of Neanderthal ancestry. The rate of introgression is indeed strongly dependent on the intensity of background selection—a measure of the degree of linkage with regions that are highly conserved. Conversely, in regions where most deleterious variants are recessive, Neanderthal ancestry may have actually been selected for [3] (Fig. 1b, middle panel). In these regions, a moderate rate of admixture confers a selective advantage to the admixed individuals, by increasing heterozygosity and decreasing the deleterious load. Further efforts are required to systematically quantify the deleteriousness of alleles that were present in the Neanderthal genome and the relative impact of recessive/additive variants on the fate of introgressed haplotypes. This, combined with measures of the local rate of human/Neanderthal divergence, will provide a better picture of the disparate landscape of Neanderthal ancestry along the genome of modern humans. Natural selection has had a profound impact on the landscape of archaic functional alleles that were introgressed. For example, Dannemann et al. have shown that non-synonymous archaic alleles that segregate today in the human population tend to be less deleterious than non-synonymous alleles that segregate at similar frequency on non-archaic haplotypes [4]. Furthermore, archaic introgression appears to be less pronounced in regions of functional relevance such as promoters or protein-coding regions, with respect to other elements such as enhancers [5, 6]. Given their larger size across the genome, enhancers are then the functional elements that carry the largest number of Neanderthal alleles [6]. It is therefore expected that a significant fraction of the phenotypic impact of Neanderthal introgression is mediated by changes in enhancer activity. Despite the overall purge of archaic haplotypes in the genome of modern humans, Neanderthal haplotypes have been found to harbor more regulatory potential than their non-Neanderthal counterparts with similar allele frequency [4, 7]. This observation can be explained by an increased adaptive nature of Neanderthal haplotypes or, more simply, by the increase in local genetic diversity induced by the introgression event, owing to the high divergence between Neanderthals and modern humans. Massively parallel reporter assays, combined with deep learning approaches, may provide further insights into the mechanisms through which Neanderthal material, and specific genetic variants, affect human phenotypes. To characterize the regulatory effects of archaic haplotypes, McCoy et al. have compared the relative expression of archaic and non-archaic alleles in a collection of 44 diverse tissues from the Gene–Tissue Expression database (GTEx) [8]. Neanderthal haplotypes tend to be generally biased towards lower expression levels, this effect being most pronounced in the brain and testis. This observation has been interpreted as supporting the occurrence of genetic incompatibilities between Neanderthals and modern humans, due to epistatic interactions as predicted by the Dobzhansky–Muller model of speciation (i.e., the fixation of incompatible mutations in two offspring lineages that share a common parental lineage). A weaker archaic ancestry on the X chromosome and near testis-expressed genes supports further the notion of a high rate of infertility among first-generation hybrids [1]. The enrichment in archaic haplotypes among loci that are associated with neurological and psychiatric disorders [2], together with the lower expression of archaic haplotypes in the brain, suggests that epistatic effects also affected cognitive capacities in hybrid individuals. Further work is clearly needed to assess the contribution of epistatic incompatibilities to the purging of functional Neanderthal alleles from the human lineage. Despite the overall deleteriousness of Neanderthal material in the genomes of modern humans [5], it is increasingly accepted that, in some cases, archaic DNA allowed early Eurasians to adapt to their newly encountered environments (Fig. 1b, lower panel). Detecting these events of adaptive introgression remains a daunting task, as the signatures used to detect positive selection (e.g., extended haplotype homozygosity) are similar to those left by archaic introgression, leading to spurious signals. To efficiently capture the adaptive nature of introgression, Racimo et al. have proposed a statistical framework based on the number and allelic frequencies of sites that are uniquely shared between archaic hominins and specific modern populations [9]. Using this framework, multiple genomic regions presenting compelling evidence of adaptive introgression have been detected [9], including regions associated to skin pigmentation or response to UV radiation and genes such as BNC2, POU2F3, or HYAL3. Metabolic processes have also been found as targets of adaptive introgression, including genes such as SLC16A11, known to alter lipid metabolism and type 2 diabetes risk, or TBX15/WARS2, associated to adipose tissue differentiation and body fat distribution. Importantly, immune functions appear to be privileged targets of adaptive introgression, suggesting that modern humans acquired from Neanderthal adaptive variants related to host survival against infection. Evidence supporting this notion has been reported for the Toll-like receptor TLR1/6/10 cluster, primarily involved in the sensing of bacterial products, and for several antiviral response genes, such as the NOD-like receptor NLRC5, the cytoplasmic sensor IFIH1, or the restriction factors OAS1/OAS3. It is interesting to note that an excess of regulatory variants (i.e., eQTLs) controlling transcriptional responses to viral stimuli has also been reported among Neanderthal haplotypes, with respect to non-archaic haplotypes [7]. Consistent with these results, Enard and Petrov have shown that adaptive introgression from Neanderthals has been pervasive among human virus-interacting proteins (VIPs), the strongest enrichment being observed for VIPs interacting with RNA viruses [10]. These results collectively emphasize the important role of introgression in human adaptation, in particular to pathogen pressures. Yet, new methods are needed to characterize how subtle but coordinated shifts in frequency of archaic haplotypes have contributed to modern human adaptation involving polygenic traits. The phenotypic impact of adaptively introgressed haplotypes is mediated, in some cases, by genetic variants that are not of Neanderthal origin themselves. This is notably the case for the well-characterized OAS1 locus [11]. The rs10774671-G allele, which is present in Europeans specifically on Neanderthal haplotypes, alters the splicing patterns of OAS1, leading to increased anti-viral activity. Interestingly, this variant is also present at high frequency in African populations, where it lies on a distinct haplotypic background. These observations suggest that an introgression event occurring in Eurasians and targeting the OAS1 locus re-introduced a beneficial allele that had been lost during the out-of-Africa bottleneck. Recent work by Rinker et al. suggests that the re-introduction through introgression of ancient functional alleles—i.e., predating the split between Neanderthals and modern humans—has been a common phenomenon [12]. Yet, the extent to which such re-introduced variants have contributed to the adaptive nature of archaic introgression remains an open question. Most efforts in understanding the functional consequences of archaic introgression have been focused on Neanderthals and modern Eurasians, primarily of European ancestry. However, the number of ancient genomes is increasing, including high coverage whole-genomes from various Neanderthals (Altai, Vindija, and Chagyrskaya) and the Denisovan Altai. This, together with the possibility of identifying segments of archaic DNA directly from modern genomes, clearly opens a highly informative window to study the patterns of population diversity of ancient, now-extinct hominins and their admixture history with modern humans. Dissecting and quantifying the archaic ancestry in the genomes of modern Oceanians, for example, offer an incredible access to the past history of Denisovans and the extent to which they contributed to the adaptation of early modern humans entering the Pacific. Likewise, the possibility that early Africans also admixed with a yet-unknown ancient hominin is increasingly supported but needs further investigation, both methodological and empirical. We have exciting times in front of us that, all together, will provide a much finer understanding of the functional consequences of archaic introgression in modern humans, their adaptive nature and their contribution to the diversity of human phenotypes. 1. Sankararaman S, Mallick S, Dannemann M, Prufer K, Kelso J, Paabo S, Patterson N, Reich D. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507:354–7. CAS Article Google Scholar 2. Simonti CN, Vernot B, Bastarache L, Bottinger E, Carrell DS, Chisholm RL, Crosslin DR, Hebbring SJ, Jarvik GP, Kullo IJ, et al. The phenotypic legacy of admixture between modern humans and Neandertals. Science. 2016;351:737–41. CAS Article Google Scholar 3. Harris K, Nielsen R. The genetic cost of Neanderthal introgression. Genetics. 2016;203:881–91. CAS Article Google Scholar 4. Dannemann M, Prufer K, Kelso J. Functional implications of Neandertal introgression in modern humans. Genome Biol. 2017;18:61. Article Google Scholar 5. Petr M, Paabo S, Kelso J, Vernot B. Limits of long-term selection against Neandertal introgression. Proc Natl Acad Sci U S A. 2019;116:1639–44. CAS Article Google Scholar 6. Silvert M, Quintana-Murci L, Rotival M. Impact and evolutionary determinants of Neanderthal introgression on transcriptional and post-transcriptional regulation. Am J Hum Genet. 2019;104:1241–50. CAS Article Google Scholar 7. Quach H, Rotival M, Pothlichet J, Loh YE, Dannemann M, Zidane N, Laval G, Patin E, Harmant C, Lopez M, et al. Genetic adaptation and Neandertal admixture shaped the immune system of human populations. Cell. 2016;167:643–56. CAS Article Google Scholar 8. McCoy RC, Wakefield J, Akey JM. Impacts of Neanderthal-introgressed sequences on the landscape of human gene expression. Cell. 2017;168:916–27. CAS Article Google Scholar 9. Racimo F, Marnetto D, Huerta-Sanchez E. Signatures of archaic adaptive introgression in present-day human populations. Mol Biol Evol. 2017;34:296–317. CAS PubMed Google Scholar 10. Enard D, Petrov DA. Evidence that RNA viruses drove adaptive introgression between Neanderthals and modern humans. Cell. 2018;175:360–71. CAS Article Google Scholar 11. Sams AJ, Dumaine A, Nédélec Y, Yotova V, Alfieri C, Tanner JE, Messer PW, Barreiro LB. Adaptively introgressed Neandertal haplotype at the OAS locus functionally impacts innate immune responses in humans. Genome Biol. 2016;17:246. Article Google Scholar 12. Rinker DC, Simonti CN, McArthur E, Shaw D, Hodges E, Capra JA. Neanderthal introgression reintroduced functional ancestral alleles lost in Eurasian populations. bioRxiv. 2019. https://doi.org/10.1101/533257. Download references The laboratory of L.Q.-M. is supported by the Institut Pasteur, the Collège de France, the French Government’s Investissement d’Avenir program, Laboratoires d’Excellence “Integrative Biology of Emerging Infectious Diseases” (ANR-10- LABX-62-IBEID) and “Milieu Intérieur” (ANR-10-LABX-69-01), and the Fondation pour la Recherche Médicale (Equipe FRM DEQ20180339214). Affiliations Unit of Human Evolutionary Genetics, CNRS UMR2000, Institut Pasteur, 75015, Paris, France Maxime Rotival  & Lluis Quintana-Murci Chair Human Genomics & Evolution, Collège de France, 75005, Paris, France Lluis Quintana-MurciAuthors Search for Maxime Rotival in: PubMed • Google Scholar Search for Lluis Quintana-Murci in: PubMed • Google Scholar Contributions Both authors read and approved the final manuscript. Corresponding author Correspondence to Lluis Quintana-Murci. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Reprints and Permissions Cite this article Rotival, M., Quintana-Murci, L. Functional consequences of archaic introgression and their impact on fitness. Genome Biol 21, 3 (2020) doi:10.1186/s13059-019-1920-z Download citation Published 02 January 2020 DOI https://doi.org/10.1186/s13059-019-1920-z

    更新日期:2020-01-02
  • scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation
    Genome Biol. (IF 14.028) Pub Date : 2019-12-31
    E. Madissoon; A. Wilbrey-Clark; R. J. Miragaia; K. Saeb-Parsy; K. T. Mahbubani; N. Georgakopoulos; P. Harding; K. Polanski; N. Huang; K. Nowicki-Osuch; R. C. Fitzgerald; K. W. Loudon; J. R. Ferdinand; M. R. Clatworthy; A. Tsingene; S. van Dongen; M. Dabrowska; M. Patel; M. J. T. Stubbington; S. A. Teichmann; O. Stegle; K. B. Meyer

    The Human Cell Atlas is a large international collaborative effort to map all cell types of the human body. Single-cell RNA sequencing can generate high-quality data for the delivery of such an atlas. However, delays between fresh sample collection and processing may lead to poor data and difficulties in experimental design. This study assesses the effect of cold storage on fresh healthy spleen, esophagus, and lung from ≥ 5 donors over 72 h. We collect 240,000 high-quality single-cell transcriptomes with detailed cell type annotations and whole genome sequences of donors, enabling future eQTL studies. Our data provide a valuable resource for the study of these 3 organs and will allow cross-organ comparison of cell types. We see little effect of cold ischemic time on cell yield, total number of reads per cell, and other quality control metrics in any of the tissues within the first 24 h. However, we observe a decrease in the proportions of lung T cells at 72 h, higher percentage of mitochondrial reads, and increased contamination by background ambient RNA reads in the 72-h samples in the spleen, which is cell type specific. In conclusion, we present robust protocols for tissue preservation for up to 24 h prior to scRNA-seq analysis. This greatly facilitates the logistics of sample collection for Human Cell Atlas or clinical studies since it increases the time frames for sample processing.

    更新日期:2019-12-31
  • Distinct epigenetic features of tumor-reactive CD8+ T cells in colorectal cancer patients revealed by genome-wide DNA methylation analysis
    Genome Biol. (IF 14.028) Pub Date : 2019-12-31
    Rui Yang; Sijin Cheng; Nan Luo; Ranran Gao; Kezhuo Yu; Boxi Kang; Li Wang; Qiming Zhang; Qiao Fang; Lei Zhang; Chen Li; Aibin He; Xueda Hu; Jirun Peng; Xianwen Ren; Zemin Zhang

    Tumor-reactive CD8+ tumor-infiltrating lymphocytes (TILs) represent a subtype of T cells that can recognize and destroy tumor specifically. Understanding the regulatory mechanism of tumor-reactive CD8+ T cells has important therapeutic implications. Yet the DNA methylation status of this T cell subtype has not been elucidated. In this study, we segregate tumor-reactive and bystander CD8+ TILs, as well as naïve and effector memory CD8+ T cell subtypes as controls from colorectal cancer patients, to compare their transcriptome and methylome characteristics. Transcriptome profiling confirms previous conclusions that tumor-reactive TILs have an exhausted tissue-resident memory signature. Whole-genome methylation profiling identifies a distinct methylome pattern of tumor-reactive CD8+ T cells, with tumor-reactive markers CD39 and CD103 being specifically demethylated. In addition, dynamic changes are observed during the transition of naïve T cells into tumor-reactive CD8+ T cells. Transcription factor binding motif enrichment analysis identifies several immune-related transcription factors, including three exhaustion-related genes (NR4A1, BATF, and EGR2) and VDR, which potentially play an important regulatory role in tumor-reactive CD8+ T cells. Our study supports the involvement of DNA methylation in shaping tumor-reactive and bystander CD8+ TILs, and provides a valuable resource for the development of novel DNA methylation markers and future therapeutics.

    更新日期:2019-12-31
  • Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species
    Genome Biol. (IF 14.028) Pub Date : 2019-12-28
    Serena Manara; Francesco Asnicar; Francesco Beghini; Davide Bazzani; Fabio Cumbo; Moreno Zolfo; Eleonora Nigro; Nicolai Karcher; Paolo Manghi; Marisa Isabell Metzger; Edoardo Pasolli; Nicola Segata

    Humans have coevolved with microbial communities to establish a mutually advantageous relationship that is still poorly characterized and can provide a better understanding of the human microbiome. Comparative metagenomic analysis of human and non-human primate (NHP) microbiomes offers a promising approach to study this symbiosis. Very few microbial species have been characterized in NHP microbiomes due to their poor representation in the available cataloged microbial diversity, thus limiting the potential of such comparative approaches. We reconstruct over 1000 previously uncharacterized microbial species from 6 available NHP metagenomic cohorts, resulting in an increase of the mappable fraction of metagenomic reads by 600%. These novel species highlight that almost 90% of the microbial diversity associated with NHPs has been overlooked. Comparative analysis of this new catalog of taxa with the collection of over 150,000 genomes from human metagenomes points at a limited species-level overlap, with only 20% of microbial candidate species in NHPs also found in the human microbiome. This overlap occurs mainly between NHPs and non-Westernized human populations and NHPs living in captivity, suggesting that host lifestyle plays a role comparable to host speciation in shaping the primate intestinal microbiome. Several NHP-specific species are phylogenetically related to human-associated microbes, such as Elusimicrobia and Treponema, and could be the consequence of host-dependent evolutionary trajectories. The newly reconstructed species greatly expand the microbial diversity associated with NHPs, thus enabling better interrogation of the primate microbiome and empowering in-depth human and non-human comparative and co-diversification studies.

    更新日期:2019-12-30
  • mRNA structural elements immediately upstream of the start codon dictate dependence upon eIF4A helicase activity
    Genome Biol. (IF 14.028) Pub Date : 2019-12-30
    Joseph A. Waldron; David C. Tack; Laura E. Ritchey; Sarah L. Gillen; Ania Wilczynska; Ernest Turro; Philip C. Bevilacqua; Sarah M. Assmann; Martin Bushell; John Le Quesne

    The RNA helicase eIF4A1 is a key component of the translation initiation machinery and is required for the translation of many pro-oncogenic mRNAs. There is increasing interest in targeting eIF4A1 therapeutically in cancer, thus understanding how this protein leads to the selective re-programming of the translational landscape is critical. While it is known that eIF4A1-dependent mRNAs frequently have long GC-rich 5′UTRs, the details of how 5′UTR structure is resculptured by eIF4A1 to enhance the translation of specific mRNAs are unknown. Using Structure-seq2 and polysome profiling, we assess global mRNA structure and translational efficiency in MCF7 cells, with and without eIF4A inhibition with hippuristanol. We find that eIF4A inhibition does not lead to global increases in 5′UTR structure, but rather it leads to 5′UTR remodeling, with localized gains and losses of structure. The degree of these localized structural changes is associated with 5′UTR length, meaning that eIF4A-dependent mRNAs have greater localized gains of structure due to their increased 5′UTR length. However, it is not solely increased localized structure that causes eIF4A-dependency but the position of the structured regions, as these structured elements are located predominantly at the 3′ end of the 5′UTR. By measuring changes in RNA structure following eIF4A inhibition, we show that eIF4A remodels local 5′UTR structures. The location of these structural elements ultimately determines the dependency on eIF4A, with increased structure just upstream of the CDS being the major limiting factor in translation, which is overcome by eIF4A activity.

    更新日期:2019-12-30
  • The somatic mutation landscape of the human body
    Genome Biol. (IF 14.028) Pub Date : 2019-12-24
    Pablo E. García-Nieto; Ashby J. Morrison; Hunter B. Fraser

    Somatic mutations in healthy tissues contribute to aging, neurodegeneration, and cancer initiation, yet they remain largely uncharacterized. To gain a better understanding of the genome-wide distribution and functional impact of somatic mutations, we leverage the genomic information contained in the transcriptome to uniformly call somatic mutations from over 7500 tissue samples, representing 36 distinct tissues. This catalog, containing over 280,000 mutations, reveals a wide diversity of tissue-specific mutation profiles associated with gene expression levels and chromatin states. For example, lung samples with low expression of the mismatch-repair gene MLH1 show a mutation signature of deficient mismatch repair. In addition, we find pervasive negative selection acting on missense and nonsense mutations, except for mutations previously observed in cancer samples, which are under positive selection and are highly enriched in many healthy tissues. These findings reveal fundamental patterns of tissue-specific somatic evolution and shed light on aging and the earliest stages of tumorigenesis.

    更新日期:2019-12-25
  • tmap: an integrative framework based on topological data analysis for population-scale microbiome stratification and association studies
    Genome Biol. (IF 14.028) Pub Date : 2019-12-23
    Tianhua Liao; Yuchen Wei; Mingjing Luo; Guo-Ping Zhao; Haokui Zhou

    Untangling the complex variations of microbiome associated with large-scale host phenotypes or environment types challenges the currently available analytic methods. Here, we present tmap, an integrative framework based on topological data analysis for population-scale microbiome stratification and association studies. The performance of tmap in detecting nonlinear patterns is validated by different scenarios of simulation, which clearly demonstrate its superiority over the most commonly used methods. Application of tmap to several population-scale microbiomes extensively demonstrates its strength in revealing microbiome-associated host or environmental features and in understanding the systematic interrelations among their association patterns. tmap is available at https://github.com/GPZ-Bioinfo/tmap.

    更新日期:2019-12-23
  • RADAR: differential analysis of MeRIP-seq data with a random effect model
    Genome Biol. (IF 14.028) Pub Date : 2019-12-23
    Zijie Zhang; Qi Zhan; Mark Eckert; Allen Zhu; Agnieszka Chryplewicz; Dario F. De Jesus; Decheng Ren; Rohit N. Kulkarni; Ernst Lengyel; Chuan He; Mengjie Chen

    Epitranscriptome profiling using MeRIP-seq is a powerful technique for in vivo functional studies of reversible RNA modifications. We develop RADAR, a comprehensive analytical tool for detecting differentially methylated loci in MeRIP-seq data. RADAR enables accurate identification of altered methylation sites by accommodating variability of pre-immunoprecipitation expression level and post-immunoprecipitation count using different strategies. In addition, it is compatible with complex study design when covariates need to be incorporated in the analysis. Through simulation and real dataset analyses, we show that RADAR leads to more accurate and reproducible differential methylation analysis results than alternatives, which is available at https://github.com/scottzijiezhang/RADAR.

    更新日期:2019-12-23
  • Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model
    Genome Biol. (IF 14.028) Pub Date : 2019-12-23
    F. William Townes; Stephanie C. Hicks; Martin J. Aryee; Rafael A. Irizarry

    Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.

    更新日期:2019-12-23
  • Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
    Genome Biol. (IF 14.028) Pub Date : 2019-12-23
    Christoph Hafemeister; Rahul Satija

    Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.

    更新日期:2019-12-23
  • A comparison framework and guideline of clustering methods for mass cytometry data
    Genome Biol. (IF 14.028) Pub Date : 2019-12-23
    Xiao Liu; Weichen Song; Brandon Y. Wong; Ting Zhang; Shunying Yu; Guan Ning Lin; Xianting Ding

    With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.

    更新日期:2019-12-23
  • PIRCh-seq: functional classification of non-coding RNAs associated with distinct histone modifications
    Genome Biol. (IF 14.028) Pub Date : 2019-12-20
    Jingwen Fang; Qing Ma; Ci Chu; Beibei Huang; Lingjie Li; Pengfei Cai; Pedro J. Batista; Karen Erisse Martin Tolentino; Jin Xu; Rui Li; Pengcheng Du; Kun Qu; Howard Y. Chang

    We develop PIRCh-seq, a method which enables a comprehensive survey of chromatin-associated RNAs in a histone modification-specific manner. We identify hundreds of chromatin-associated RNAs in several cell types with substantially less contamination by nascent transcripts. Non-coding RNAs are found enriched on chromatin and are classified into functional groups based on the patterns of their association with specific histone modifications. We find single-stranded RNA bases are more chromatin-associated, and we discover hundreds of allele-specific RNA-chromatin interactions. These results provide a unique resource to globally study the functions of chromatin-associated lncRNAs and elucidate the basic mechanisms of chromatin-RNA interactions.

    更新日期:2019-12-20
  • What is the question?
    Genome Biol. (IF 14.028) Pub Date : 2019-12-19
    Itai Yanai; Martin Lercher

    This was [Stephen] Hawking’s central point in 1976 when he created something that came to be known as the information paradox. It was an extremely deep and important observation. It wasn’t important that Hawking didn’t get the right answer; he asked the right question. And this became a central debate that took twenty five years to resolve Leonard Susskind (https://www.youtube.com/watch?v=2DIl3Hfh9tY, Minute: 13:40) The single greatest misunderstanding about science by the public is that scientists solve problems; in reality, scientists are primarily concerned with creating them. We previously introduced François Jacob’s notions of day science, when we work with fixed targets in the lab or at the computer to solve problems, and night science, when our minds wander more freely to generate new ideas and find hidden connections [1]. It is certainly easier to imagine science as a logical, step-wise process. But it is the generation of a new question in the unpredictable and wandering process of night science that paves our way towards a discovery, effectively changing our perception of reality. Imagine being a fly on the wall of the office of Prof. Dr. Heinrich Friedrich Weber in the Polytechnic Institute in Zürich in 1900. “What can I do for you, Mr. Einstein?” asks the professor sternly as he looks at one of his least favorite students. “Professor” the bold student begins, “what are the greatest open questions of theoretical physics? I wish to tackle them.” “Well, young man, as you would surely know had you regularly attended my lectures, there are three major unsolved problems today. I don’t think your talents are up to the task, but I will humor you with a retelling: How do we have to change our concept of time so that Maxwell’s equations are no longer in contradiction with the observed constancy of the speed of light? How can the absorption and emission of light in discrete packages avoid inconsistencies in our concept of black body radiation? And finally: How can gravity be understood as deformations in space and time?” Equipped with these questions, young Einstein rushes back to his lonesome desk. The curious scientist tackles them one by one, braving each logical step as it comes, each leading him undeterred to elegant conclusions. He solves all three problems by the age of 40, transforming himself into the iconic scientist we know today. This is the highroad of scientific progress: the leaders of a field identify the major open questions—the knowledge gaps in the “brick wall” of science—and then creative individuals around the world brood over them until someone derives the answer. Aiming to accelerate this scientific process, it is not uncommon to find public lists of open scientific questions. Panels of cancer biologists list provocative questions singled out for funding [2]. Mathematicians have a list of seven unsolved “Millennium” problems, with a million dollar prize for each solution [3]. The contributors to Wikipedia provide lists of open questions for 14 different disciplines, including physics, chemistry, biology, medicine, and neuroscience. So can we reasonably expect the leading scientists of each discipline to gather 10 years from now, nominate the bright minds that answered those questions for prizes and medals, and compile the next top ten lists? Surprisingly—or not, as we will argue—if you compare a list of the great discoveries in the life sciences over the 25 years leading up to 2015 with the list of questions provided early on in this period, you notice very little overlap (Table 1). Table 1 A comparison of the top 5 open problems in the life sciences posed in 1997 and the 5 biggest discoveries of the past 25 years listed in 2015Full size table And as you might have guessed, Einstein did not have a top three list of open questions to start with. What he did have were topics in the form of puzzling observations, puzzling primarily to himself. Let us take the first one as an example. When Einstein was still in school, he arrived at a fascinating paradox: if you imagined traveling parallel to a light beam at the speed of light, it should look like a standing, oscillating wave—but that would contradict Maxwell’s equations, which otherwise seemed so perfect at explaining the properties of electromagnetic radiation. For years, Einstein tried to find a way to modify Maxwell’s equations so that things would fall into place. He failed, again and again, until one night, coming home from a visit to a friend to whom he had complained about his failure, it dawned on him: it was not Maxwell’s fault. It was time’s. What if our notion of time itself was incorrect? In a moment when he was not consciously wrestling with equations but left his mind wander freely—in other words in a bout of night science—Einstein had finally arrived at the very question that was the key to his conundrum: was there a way to change our concept of time that would make things fit? Einstein was not given the question. He discovered it. The trouble with trying to solve questions posed by communities is that all the good ones are gone, especially if they can be answered. Why then do not we see the scientists around us spend their nights hunting for questions? Instead, it seems that having a clear question is a scientist’s natural state of mind; after all, the storyline of almost every scientific paper starts with a clearly defined question and then proceeds directly to the answer. In reality, the way scientists retell their discoveries may reflect much more how humans communicate knowledge than how those discoveries were actually made. It is not just that humans have always loved a good story [6]; a linearly structured exposition with logical steps is indeed the most effective way of instruction. Hidden behind the storylines of our papers, we may have spent long nights wandering around for questions. But once we stumbled upon the right one, it was transformative, often almost completely erasing our prior goals. We often see knowledge as a wall of information: individual pieces of knowledge fit together like bricks within the wall, summarizing what is known on a particular topic. This metaphor suggests that the way to advance science is to extend this wall of knowledge, strengthening it and thereby increasing its explanatory power, or extending it beyond the edges of a text book. A hole in the wall is seen as a “knowledge gap,” and we can “flesh out” existing theories by closing such gaps. And indeed, addressing a specific problem may often lead to knowledge that fits squarely within the confines of a wall of knowledge. But this picture gives a false sense of the structure and rigidity of knowledge and its accumulation. The nature of discoveries is that they are unexpected: they may not fit neatly into our existing edifice of knowledge. Although the research may be originally motivated by a perceived gap, the knowledge resulting from the discovery may in fact not complete any part of the wall but instead may lead to the construction of a completely new and unexpected area: we may be forced to build a new wall orthogonal to the first, or even to tear down parts of the existing structure. This is an uncomfortable concept for many of us, who would prefer a tidy and beautiful universe, where a rational process helps us to illuminate the world. And yet, the most interesting unknowns of science are unknown unknowns—gaps that we were not even aware of before chancing upon them. A truly new question, as an unknown unknown, is not predictable, and generating it requires night science in addition to our day science work. This aspect of the research process is often hidden by the work that follows the invention of the question. In some cases, scientists may spend many years on answering that question, as in the quote on Stephen Hawking at the outset. And while scientists are systematically taught the process of day science—experimental design and controls—they are typically immersed only slowly in the depths of night science: A student joining a lab is often presented with a hypothesis to work on and may see science as a hypothesis-testing endeavor. Many young postdocs have been told that as a PhD student their job was to answer questions—now they have to discover their own unknown unknowns. Community-generated questions such as those in the left columns of Tables 1 and 2 are typically so general that they do not provide a new direction towards an answer. Answering one of them almost always requires a rephrasing, a refocusing of the original question, which exposes a new aspect of the problem and only becomes possible after an insight into the phenomenon at hand. As an example, “Does the microbiome affect a tumor’s growth?” is a valid question, but it can only serve as a starting point for our explorations. After some initial analyses and much subsequent night science, we might go back and ask, for example, “Does a tumor manipulate the microbiome as a kind of co-conspirator?”, or “Can bacteria become intra-cellular components of a cancer cell?”. These may lead to hypotheses that are testable and novel. Table 2 Rephrased questions that led to scientific breakthroughsFull size table Sometimes, such new questions will not even be posed in response to a specific public question, yet may lead to answering it in an unexpected way. Our ignorance about a certain topic often provides a fertile ground for novel questions [12]. Discovering the question follows from immersion in a particular topic. Francisco Mojica, for example, provided a radically new hypothesis for why bacterial genomes have a structure that previous researchers had termed “clustered regularly interspaced palindromic repeats” (or CRISPR), separated by evenly sized “spacers” of apparently random DNA. Before Mojica’s work, not many scientists were interested in these peculiar structures, and the problem of CRISPR elements could have been stated as “Why do bacteria have CRISPR elements?”. However, this question is too general to be solved, lacking any hints at where to look for the answer. The inconspicuous spacers were largely ignored. Mojica, however, asked [11]: what does the similarity of the spacers to known DNA sequences tell us about their function? Again, it was the question that led the way, generated in night science but requiring rigorous day science for its answer: the spacers are copies of viral sequences, guiding an adaptive bacterial immune system towards their destruction. Table 2 lists more examples of questions whose refocusing led to breakthroughs. As the Susskind quote about Stephen Hawking above shows, if a scientist proposes an important question and provides an answer to it that is later deemed wrong, the scientist will still be credited with posing the question. This is because the framing of a fundamentally new question lies, by definition, beyond what we can expect within our frame of knowledge: while answering a question relies upon logic, coming up with a new question often rests on an illogical leap into the unknown—the hallmark of night science. Why, then, does it not seem this way? Why do questions appear secondary to answers? It may be because a new question is so powerful that it transforms our reality. A new question tends to erase its own origin; it is hard to imagine that there ever was a time when the question was not there. The effort immediately shifts to figuring out the answer to the new aspect of reality illuminated by the question. To get a sense of this, consider the weekly New Yorker Cartoon Contest, where you can propose a funny caption for a caption-less cartoon. This is a difficult challenge, as appreciated by anyone who has attempted this (try it yourself in Fig. 1). The minute you read someone else’s caption though, you are tied to this particular solution (there is one hidden in the caption of Fig. 2). Likewise, a new scientific question seems obvious once stated (such as “What can you learn from the similarities of CRISPR spacers to known DNA sequences?”), but that should not lead us to think that the question’s introduction was obvious, too. Fig. 1 A New Yorker cartoon contest. Can you think of some funny caption to make sense of the cartoon? (Credit: www.JackZiegler.com, licensed from the New Yorker issue May 9, 2005)Full size image Fig. 2 The perceived (day science) and hidden (night science) view of the scientific method. (The caption for the winning cartoon in Fig. 1 is “Neither the time nor the place, Doug!”)Full size image Finding the question can be fun, as in thinking of a cartoon caption. But it can also be extremely difficult psychologically. Scientists are often expected by the public to know it all, and yet, “feeling stupid” is a common mode of operation for us [13]. Science is the art of dealing with things we do not know enough about. As Wernher von Braun, the father of German and US rocket programs, phrased it: “Research is what I’m doing when I don’t know what I’m doing.” Science is humbling in this way. For young scientists, it is often very difficult to understand that it is perfectly normal to not know the answer—or even the question. Learning to embrace this uncertainty is part of our maturation as scientists. Uri Alon has an intuitive image to describe the process of re-finding our questions [14]. Given what we know about a given topic “A,” a researcher predicts that it should be possible to arrive at point “B,” a scientific destination that seems interesting—a hypothesis. However, the plot inevitably thickens over the course of the research project, and new hurdles force the scientist into a meandering path. Soon, the researcher is lost, having lost sight of the start point (which suddenly seems shaky) and end point (which appears unreachable). Uri calls this “being in the cloud”—you have lost your original question, but the reason why this has occurred is strange and thus potentially exciting and itself worthy of study. From inside the cloud, the situation may seem desperate, but Uri sees the cloud as the hallmark of science: if you are in the cloud, then you might have stumbled upon something non-obvious and interesting. “I’m very confused” a student would tell Uri, to which he would reply, “Oh good - So you’re in the cloud!” Eventually, a new question that arose inside the cloud may lead the way to an unexpected destination “C.” The scientific method is often perceived as a simple sequence that leads from a problem to an answer, possibly through long iterations of modified hypotheses. But our reality is much less structured: it often starts with a topic and some observations, leading to the finding of patterns and questions about those patterns, possibly long before we have any explicit hypothesis or any direct tests (Fig. 2). And even if a project starts out with a very specific hypothesis, in our experiences, it still generally arrives at a very different point than expected. In some way, then, night science may be most productive when it has no agenda, when there are no particular questions it is trying to reshape or resolve. When the scientist does not have a hypothesis, she is free to explore, to make connections. In some sense, any kind of expectation on how things are to behave—a hypothesis—is a liability that could obstruct a new idea that awaits our discovery. Once night science elucidates and reframes this question, the researcher can use the full power of day science to solve it. In this sense, a major discovery is typically both the solution and the problem. Much of basic, curiosity-driven science is exploration, and night science is a fundamental part of that; yet funding bodies often demand that research must be hypothesis-driven. But while some part of night science can be done with the help of an armchair and some good coffee, other parts require the exploration of large and complicated data sets. If no funding is provided for such endeavors, the generation of new questions may be stifled, hindering scientific progress: in science, the problem that is eventually solved is often not the one that was initially sought out. To be sure, every one of us spends a lot of their time solving questions that have already been posed. For example, we might work out the particular regulatory structure of a gene or the evolution of a gene family. Often, the hope is that this immediate problem, once solved, will lead to a new and exciting question. A case in point was the sequencing of the human genome: the initial scientific question was clear (“What is the DNA sequence of a human genome?”), but the really exciting questions about our genome biology arose only afterwards. If an idea is truly unexpected, then we could not have arrived at it solely through existing questions; instead, we had to navigate through night science, moving from disparate observations to previously unknown questions. It is freeing and exhilarating to embrace this uncertainty, to fly right into the heart of the cloud, even if we may feel stupid and lost there. Night science, that realm where questions and ideas are born, appears so mysterious that it is often not described at all. But it is our premise that there are patterns to it, and this is what will occupy us in the following installments of this mini-series. 1. Yanai I, Lercher M. Night science. Genome Biol. 2019;20:179. Article Google Scholar 2. NCI Posts New “Provocative Questions”. Cancer Discov. [Internet]. 2015;5:569–570. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25929849. [cited 2019 Oct 26] 3. CMI. Millennium Prize Problems. Available from: http://www.claymath.org/millennium-problems/millennium-prize-problems. Accessed 12 Dec 2019. 4. Dev SB. Unsolved problems in biology-the state of current thinking. Prog Biophys Mol Biol. 2015;117(2-3):232–9. https://doi.org/10.1016/j.pbiomolbio.2015.02.001. Epub 2015 Feb 14. Article Google Scholar 5. 5 Important Breakthroughs in Biology from the Last 25 Years | Brainscape Blog. Available from: https://www.brainscape.com/blog/2015/06/biology-breakthroughs-and-discoveries/. [cited 2019 Oct 30] 6. Harari YN. Sapiens: a brief history of humankind. Harper; 2014. Google Scholar 7. Curie MS. Rays emitted by compounds of uranium and of thorium. Comptes Rendus. 1898;126:1101–3. Google Scholar 8. Darwin C. On the origin of species by means of natural selection, or, the preservation of favoured races in the struggle for life. J. Murray; 1859. Google Scholar 9. Gödel K. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Math und Phys Springer-Verlag; 1931;38:173–198. Article Google Scholar 10. McClintock B. The origin and behavior of mutable loci in maize. Proc Natl Acad Sci U S A. 1950;36:344–55. CAS Article Google Scholar 11. Mojica FJM, Díez-Villaseñor C, Soria E, Juez G. Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol Microbiol. 2000;36(1):244–6. CAS Article Google Scholar 12. Firestein S. Ignorance : how it drives science. New York: Oxford University Press; 2012. 13. Schwartz MA. The importance of stupidity in scientific research. J Cell Sci. 2008;121(11):1771. https://doi.org/10.1242/jcs.033340. CAS Article Google Scholar 14. Alon U. How to choose a good scientific problem. Mol Cell. 2009;35(6):726–8. https://doi.org/10.1016/j.molcel.2009.09.013. CAS Article Google Scholar Download references We thank Felicia Kuperwaser, Gal Avital, and Veronica Maurino for critical comments. We also thank the NYU night science club. Affiliations Institute for Computational Medicine, NYU Langone Health, New York, NY, 10016, USA Itai Yanai Institute for Computer Science & Department of Biology, Heinrich Heine University, 40225, Düsseldorf, Germany Martin LercherAuthors Search for Itai Yanai in: PubMed • Google Scholar Search for Martin Lercher in: PubMed • Google Scholar Contributions IY and MJL developed the ideas and wrote the manuscript together. Both authors read and approved the final manuscript. Corresponding authors Correspondence to Itai Yanai or Martin Lercher. Competing interests The authors declare that they have no competing interests. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Reprints and Permissions Cite this article Yanai, I., Lercher, M. What is the question?. Genome Biol 20, 289 (2019) doi:10.1186/s13059-019-1902-1 Download citation Received 31 October 2019 Accepted 27 November 2019 Published 19 December 2019 DOI https://doi.org/10.1186/s13059-019-1902-1

    更新日期:2019-12-19
  • Genotype-free demultiplexing of pooled single-cell RNA-seq
    Genome Biol. (IF 14.028) Pub Date : 2019-12-19
    Jun Xu; Caitlin Falconer; Quan Nguyen; Joanna Crawford; Brett D. McKinnon; Sally Mortlock; Anne Senabouth; Stacey Andersen; Han Sheng Chiu; Longda Jiang; Nathan J. Palpant; Jian Yang; Michael D. Mueller; Alex W. Hewitt; Alice Pébay; Grant W. Montgomery; Joseph E. Powell; Lachlan J.M Coin

    A variety of methods have been developed to demultiplex pooled samples in a single cell RNA sequencing (scRNA-seq) experiment which either require hashtag barcodes or sample genotypes prior to pooling. We introduce scSplit which utilizes genetic differences inferred from scRNA-seq data alone to demultiplex pooled samples. scSplit also enables mapping clusters to original samples. Using simulated, merged, and pooled multi-individual datasets, we show that scSplit prediction is highly concordant with demuxlet predictions and is highly consistent with the known truth in cell-hashing dataset. scSplit is ideally suited to samples without external genotype information and is available at: https://github.com/jon-xu/scSplit

    更新日期:2019-12-19
  • Paragraph: a graph-based structural variant genotyper for short-read sequence data
    Genome Biol. (IF 14.028) Pub Date : 2019-12-19
    Sai Chen; Peter Krusche; Egor Dolzhenko; Rachel M. Sherman; Roman Petrovski; Felix Schlesinger; Melanie Kirsche; David R. Bentley; Michael C. Schatz; Fritz J. Sedlazeck; Michael A. Eberle

    Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long-read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ancestry. Our analysis shows that Paragraph has better accuracy than other existing genotypers and can be applied to population-scale studies.

    更新日期:2019-12-19
  • TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools
    Genome Biol. (IF 14.028) Pub Date : 2019-12-18
    Cécile Monat; Sudharsan Padmarasu; Thomas Lux; Thomas Wicker; Heidrun Gundlach; Axel Himmelbach; Jennifer Ens; Chengdao Li; Gary J. Muehlbauer; Alan H. Schulman; Robbie Waugh; Ilka Braumann; Curtis Pozniak; Uwe Scholz; Klaus F. X. Mayer; Manuel Spannagl; Nils Stein; Martin Mascher

    Chromosome-scale genome sequence assemblies underpin pan-genomic studies. Recent genome assembly efforts in the large-genome Triticeae crops wheat and barley have relied on the commercial closed-source assembly algorithm DeNovoMagic. We present TRITEX, an open-source computational workflow that combines paired-end, mate-pair, 10X Genomics linked-read with chromosome conformation capture sequencing data to construct sequence scaffolds with megabase-scale contiguity ordered into chromosomal pseudomolecules. We evaluate the performance of TRITEX on publicly available sequence data of tetraploid wild emmer and hexaploid bread wheat, and construct an improved annotated reference genome sequence assembly of the barley cultivar Morex as a community resource.

    更新日期:2019-12-19
  • Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type
    Genome Biol. (IF 14.028) Pub Date : 2019-12-18
    Irene Franco; Hafdis T. Helgadottir; Aldo Moggio; Malin Larsson; Peter Vrtačnik; Anna Johansson; Nina Norgren; Pär Lundin; David Mas-Ponte; Johan Nordström; Torbjörn Lundgren; Peter Stenvinkel; Lars Wennberg; Fran Supek; Maria Eriksson

    The lifelong accumulation of somatic mutations underlies age-related phenotypes and cancer. Mutagenic forces are thought to shape the genome of aging cells in a tissue-specific way. Whole genome analyses of somatic mutation patterns, based on both types and genomic distribution of variants, can shed light on specific processes active in different human tissues and their effect on the transition to cancer. To analyze somatic mutation patterns, we compile a comprehensive genetic atlas of somatic mutations in healthy human cells. High-confidence variants are obtained from newly generated and publicly available whole genome DNA sequencing data from single non-cancer cells, clonally expanded in vitro. To enable a well-controlled comparison of different cell types, we obtain single genome data (92% mean coverage) from multi-organ biopsies from the same donors. These data show multiple cell types that are protected from mutagens and display a stereotyped mutation profile, despite their origin from different tissues. Conversely, the same tissue harbors cells with distinct mutation profiles associated to different differentiation states. Analyses of mutation rate in the coding and non-coding portions of the genome identify a cell type bearing a unique mutation pattern characterized by mutation enrichment in active chromatin, regulatory, and transcribed regions. Our analysis of normal cells from healthy donors identifies a somatic mutation landscape that enhances the risk of tumor transformation in a specific cell population from the kidney proximal tubule. This unique pattern is characterized by high rate of mutation accumulation during adult life and specific targeting of expressed genes and regulatory regions.

    更新日期:2019-12-19
  • Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination
    Genome Biol. (IF 14.028) Pub Date : 2019-12-18
    Arthur W. Pightling; James B. Pettengill; Yu Wang; Hugh Rand; Errol Strain

    Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720 Listeria monocytogenes, Salmonella enterica, and Escherichia coli short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases.

    更新日期:2019-12-19
  • Quantifying the benefit offered by transcript assembly with Scallop-LR on single-molecule long reads
    Genome Biol. (IF 14.028) Pub Date : 2019-12-18
    Laura H. Tung; Mingfu Shao; Carl Kingsford

    Single-molecule long-read sequencing has been used to improve mRNA isoform identification. However, not all single-molecule long reads represent full transcripts due to incomplete cDNA synthesis and sequencing length limits. This drives a need for long-read transcript assembly. By adding long-read-specific optimizations to Scallop, we developed Scallop-LR, a reference-based long-read transcript assembler. Analyzing 26 PacBio samples, we quantified the benefit of performing transcript assembly on long reads. We demonstrate Scallop-LR identifies more known transcripts and potentially novel isoforms for the human transcriptome than Iso-Seq Analysis and StringTie, indicating that long-read transcript assembly by Scallop-LR can reveal a more complete human transcriptome.

    更新日期:2019-12-19
  • Systematic underestimation of the epigenetic clock and age acceleration in older subjects
    Genome Biol. (IF 14.028) Pub Date : 2019-12-17
    Louis Y. El Khoury; Tyler Gorrie-Stone; Melissa Smart; Amanda Hughes; Yanchun Bao; Alexandria Andrayas; Joe Burrage; Eilis Hannon; Meena Kumari; Jonathan Mill; Leonard C. Schalkwyk

    The Horvath epigenetic clock is widely used. It predicts age quite well from 353 CpG sites in the DNA methylation profile in unknown samples and has been used to calculate “age acceleration” in various tissues and environments. The model systematically underestimates age in tissues from older people. This is seen in all examined tissues but most strongly in the cerebellum and is consistently observed in multiple datasets. Age acceleration is thus age-dependent, and this can lead to spurious associations. The current literature includes examples of association tests with age acceleration calculated in a wide variety of ways. The concept of an epigenetic clock is compelling, but caution should be taken in interpreting associations with age acceleration. Association tests of age acceleration should include age as a covariate.

    更新日期:2019-12-18
  • OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries
    Genome Biol. (IF 14.028) Pub Date : 2019-12-18
    Lin An; Tao Yang; Jiahao Yang; Johannes Nuebler; Guanjue Xiang; Ross C. Hardison; Qunhua Li; Yu Zhang

    The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.

    更新日期:2019-12-18
  • deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Bo Liu; Yadong Liu; Junyi Li; Hongzhe Guo; Tianyi Zang; Yadong Wang

    The alignment of long-read RNA sequencing reads is non-trivial due to high sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass alignment approach, which constructs graph-based alignment skeletons to infer exons and uses them to generate spliced reference sequences to produce refined alignments. deSALT addresses several difficult technical issues, such as small exons and sequencing errors, which break through bottlenecks of long RNA-seq read alignment. Benchmarks demonstrate that deSALT has a greater ability to produce accurate and homogeneous full-length alignments. deSALT is available at: https://github.com/hitbc/deSALT.

    更新日期:2019-12-17
  • Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Shujun Ou; Weija Su; Yi Liao; Kapeel Chougule; Jireh R. A. Agda; Adam J. Hellinga; Carlos Santiago Blanco Lugo; Tyler A. Elliott; Doreen Ware; Thomas Peterson; Ning Jiang; Candice N. Hirsch; Matthew B. Hufford

    Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

    更新日期:2019-12-17
  • Curing hemophilia A by NHEJ-mediated ectopic F8 insertion in the mouse
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Jian-Ping Zhang; Xin-Xin Cheng; Mei Zhao; Guo-Hua Li; Jing Xu; Feng Zhang; Meng-Di Yin; Fei-Ying Meng; Xin-Yue Dai; Ya-Wen Fu; Zhi-Xue Yang; Cameron Arakaki; Ruijun Jeanna Su; Wei Wen; Wen-Tian Wang; Wanqiu Chen; Hannah Choi; Charles Wang; Guangping Gao; Lei Zhang; Tao Cheng; Xiao-Bing Zhang

    Hemophilia A, a bleeding disorder resulting from F8 mutations, can only be cured by gene therapy. A promising strategy is CRISPR-Cas9-mediated precise insertion of F8 in hepatocytes at highly expressed gene loci, such as albumin (Alb). Unfortunately, the precise in vivo integration efficiency of a long insert is very low (~ 0.1%). We report that the use of a double-cut donor leads to a 10- to 20-fold increase in liver editing efficiency, thereby completely reconstituting serum F8 activity in a mouse model of hemophilia A after hydrodynamic injection of Cas9-sgAlb and B domain-deleted (BDD) F8 donor plasmids. We find that the integration of a double-cut donor at the Alb locus in mouse liver is mainly through non-homologous end joining (NHEJ)-mediated knock-in. We then target BDDF8 to multiple sites on introns 11 and 13 and find that NHEJ-mediated insertion of BDDF8 restores hemostasis. Finally, using 3 AAV8 vectors to deliver genome editing components, including Cas9, sgRNA, and BDDF8 donor, we observe the same therapeutic effects. A follow-up of 100 mice over 1 year shows no adverse effects. These findings lay the foundation for curing hemophilia A by NHEJ knock-in of BDDF8 at Alb introns after AAV-mediated delivery of editing components.

    更新日期:2019-12-17
  • SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Manish Goel; Hequan Sun; Wen-Biao Jiao; Korbinian Schneeberger

    Genomic differences range from single nucleotide differences to complex structural variations. Current methods typically annotate sequence differences ranging from SNPs to large indels accurately but do not unravel the full complexity of structural rearrangements, including inversions, translocations, and duplications, where highly similar sequence changes in location, orientation, or copy number. Here, we present SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions.

    更新日期:2019-12-17
  • Transcriptome assembly from long-read RNA-seq alignments with StringTie2
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Sam Kovaka; Aleksey V. Zimin; Geo M. Pertea; Roham Razaghi; Steven L. Salzberg; Mihaela Pertea

    RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.

    更新日期:2019-12-17
  • PASTMUS: mapping functional elements at single amino acid resolution in human cells
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Xinyi Zhang; Di Yue; Yinan Wang; Yuexin Zhou; Ying Liu; Yeting Qiu; Feng Tian; Ying Yu; Zhuo Zhou; Wensheng Wei

    Identification of functional elements for a protein of interest is important for achieving a mechanistic understanding. However, it remains cumbersome to assess each and every amino acid of a given protein in relevance to its functional significance. Here, we report a strategy, PArsing fragmented DNA Sequences from CRISPR Tiling MUtagenesis Screening (PASTMUS), which provides a streamlined workflow and a bioinformatics pipeline to identify critical amino acids of proteins in their native biological contexts. Using this approach, we map six proteins—three bacterial toxin receptors and three cancer drug targets, and acquire their corresponding functional maps at amino acid resolution.

    更新日期:2019-12-17
  • HOPS: automated detection and authentication of pathogen DNA in archaeological remains
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Ron Hübler; Felix M. Key; Christina Warinner; Kirsten I. Bos; Johannes Krause; Alexander Herbig

    High-throughput DNA sequencing enables large-scale metagenomic analyses of complex biological systems. Such analyses are not restricted to present-day samples and can also be applied to molecular data from archaeological remains. Investigations of ancient microbes can provide valuable information on past bacterial commensals and pathogens, but their molecular detection remains a challenge. Here, we present HOPS (Heuristic Operations for Pathogen Screening), an automated bacterial screening pipeline for ancient DNA sequences that provides detailed information on species identification and authenticity. HOPS is a versatile tool for high-throughput screening of DNA from archaeological material to identify candidates for genome-level analyses.

    更新日期:2019-12-17
  • Guidelines for benchmarking of optimization-based approaches for fitting mathematical models
    Genome Biol. (IF 14.028) Pub Date : 2019-12-16
    Clemens Kreutz

    Insufficient performance of optimization-based approaches for the fitting of mathematical models is still a major bottleneck in systems biology. In this article, the reasons and methodological challenges are summarized as well as their impact in benchmark studies. Important aspects for achieving an increased level of evidence for benchmark results are discussed. Based on general guidelines for benchmarking in computational biology, a collection of tailored guidelines is presented for performing informative and unbiased benchmarking of optimization-based fitting approaches. Comprehensive benchmark studies based on these recommendations are urgently required for the establishment of a robust and reliable methodology for the systems biology community.

    更新日期:2019-12-17
  • Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference
    Genome Biol. (IF 14.028) Pub Date : 2019-12-13
    Yuanhua Huang; Davis J. McCarthy; Oliver Stegle

    Multiplexed single-cell RNA-seq analysis of multiple samples using pooling is a promising experimental design, offering increased throughput while allowing to overcome batch variation. To reconstruct the sample identify of each cell, genetic variants that segregate between the samples in the pool have been proposed as natural barcode for cell demultiplexing. Existing demultiplexing strategies rely on availability of complete genotype data from the pooled samples, which limits the applicability of such methods, in particular when genetic variation is not the primary object of study. To address this, we here present Vireo, a computationally efficient Bayesian model to demultiplex single-cell data from pooled experimental designs. Uniquely, our model can be applied in settings when only partial or no genotype information is available. Using pools based on synthetic mixtures and results on real data, we demonstrate the robustness of Vireo and illustrate the utility of multiplexed experimental designs for common expression analyses.

    更新日期:2019-12-13
  • scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data
    Genome Biol. (IF 14.028) Pub Date : 2019-12-12
    Jose Alquicira-Hernandez; Anuja Sathe; Hanlee P. Ji; Quan Nguyen; Joseph E. Powell

    Single-cell RNA sequencing has enabled the characterization of highly specific cell types in many tissues, as well as both primary and stem cell-derived cell lines. An important facet of these studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an individual cell based on its transcriptional profile. Here, we present scPred, a new generalizable method that is able to provide highly accurate classification of single cells, using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning probability-based prediction method. We apply scPred to scRNA-seq data from pancreatic tissue, mononuclear cells, colorectal tumor biopsies, and circulating dendritic cells and show that scPred is able to classify individual cells with high accuracy. The generalized method is available at https://github.com/powellgenomicslab/scPred/.

    更新日期:2019-12-13
  • From Hawaii to PECASE award: tips of success from a female bioinformatician
    Genome Biol. (IF 14.028) Pub Date : 2019-12-12
    Lana X. Garmire

    Females are much under-represented in computational science fields, including Bioinformatics. Despite the promotion of gender equality in faculty hiring, women are faced with unique sets of challenges throughout the career development pipeline, such as lack of the support system in child-raising and lack of mentorship and advocate (esp. from females) for their career success. Over the course of my tenure-track faculty positions, I have had several points of reflection on the journey that I took as a female faculty in a male-dominated field and as a working mom in a very demanding profession. In writing up my experience thus far, I hope to encourage junior female scientists to continue down this path. There were numerous times in my early career where I felt that I would not make it, but through perseverance I found a way through. When I started my graduate school in UC-Berkeley in 2001, I worked very hard in a famous breast cancer experimental lab, but I was hopelessly lost: experiments never seemed to work however hard I tried. I then recalled that my math teacher had told me to never give up mathematics (as I had aced my advanced math class), and I decided to give Bioinformatics, a very new field, a try. This move was very bold, because I had grown up in China without much experience with computers. I had to take many classes in statistics, mathematics, and computer science in order to make the biggest career transition in my life and earn my PhD. During my postdoc at UC-San Diego, I had the notorious “two-body” (or dual-career) problem, as my husband started his tenure-track position in the University of Hawaii. This meant that I had to take on the almost impossible task of finding myself a faculty position in Hawaii, where there is only one R1 research university. After taking a detour in industry for a year, in 2012, I got an on-site interview and was offered a tenure-track position in the University of Hawaii Cancer Center! The following years (2012 to 2017) were racing for tenure while raising a young kid, in the most beautiful yet remote state of the USA. Today, it feels surreal that all this hard work has paid off and I was awarded the prestigious Presidential Early Career Award for Scientists and Engineers in 2019. In the following sections, I will share my five tips of success that are particularly important for female scientists given their unique challenges. For more general career advice, please refer to earlier publications [1, 2]: 1. You can have a career and a family at the same time. Despite the efforts in gender equality movements, the stereotypes are still prevailing that females need to choose between being wives and professional/faculty, or between being mothers and professional/faculty. During my postdoc and the postpartum periods, I was given voluntary advice to give up the idea of working or becoming a faculty and become a housewife, by males and females, relatives or non-relatives, those with or without PhDs. You have to be true to yourself, be courageous, and ask what will really make you happy, regardless what others say, even though you may feel discouraged. It is challenging to balance between work and family life, but you can certainly do it! Many female faculties have walked this path, and you are not alone. In order to succeed in this journey, you will need a supportive team. Do not be afraid to ask for help from your partner, your family and extended family, your employer, and the social network. You should do thorough homework and plan well ahead. You will need to look for a daycare or a trusting nanny, probably as soon as you know you will be a parent. With your infant, you will need to make adjustments to attend conferences or go to NIH study sections. It has been good to see a changing climate at conferences where services are offered for parents of young children, such as on-site babysitting or family viewing rooms. However, there will still be times when you will need to make the arrangements to hire a babysitter at the hotel when you attend the meeting. It is also possible to attend NIH study sections remotely. The reality is that you cannot fulfill all your roles all at the same time, and it is OK to not be perfect. You should focus on your priorities, such as yourself, your family, and work, and you should not feel guilty for paying for services such as house cleaning, meal preparation, or babysitters. Overall, being a parent made me a more efficient and better researcher, as I became more focused at work. By better prioritizing my work, I grew to be a better parent for my children, which I find just as rewarding. 2. Turn “two-body problems” into two-body teamwork. Many of the female faculty I know have “two-body problems.” When they look for faculty positions, their partners also need to find academic or industrial positions in the vicinity. This is probably the toughest personal situation for anyone, and there is not a simple universal solution. You and your partner need to work together as a team to find what works the best for you together. It is a good strategy for the more senior one between you two to initiate the job search while the other applies for nearby positions. If you are research collaborators, it is a good idea to ask for joint hiring by presenting an integrated team plan. There may need to be some level of personal sacrifice, as one person may take the role as the main career driver and the other one may take a transition position. Please remember that this situation is not your fault, and if the potential employer is not willing to help, then you are probably better off getting a position elsewhere. 3. Work efficiently and be disciplined. It is needless to say that we must work hard to achieve tenure-track positions. I would like to stress here on “working efficiently”: with parenting obligations, your time to work is limited and you will need some strategies. What I found useful is to have long-term and short-term plans, set priorities properly, and self-reflect on them often. I start my weekly schedule regularly and commit to it, unless my kid(s) get sick or have to travel. I write down my daily to-do list and cross them out one by one, which gives much gratification. I minimize unnecessary meetings where I do not see my role in play quickly, and I prefer video conference calls rather than commuting whenever possible. As a parent of young kids, I do not have the luxury to travel often, so I target the high-quality, most relevant and small meetings, where I can really interact with researchers and establish relationships (such as the Pacific Biocomputing Symposium). One of the most influential writers of modern China said: “Time is like the water in a sponge; as long as you are willing to squeeze it, it always will have some (to come out)”. You should take advantage of the spare time between your child-care duties. For example, you may review a paper when you are waiting for your child outside their enrichment class, or work on an unfinished manuscript during the time your kids nap, or simply think about project ideas when you are patting your kids to sleep. I run around my kid(s) on the weekends, but I am much more efficient now than I was during my pre-child time. 4. Be your own public relationship (PR) person. Females are usually perceived as being conservative compared to male peers, especially in fields where they are so outnumbered. We need to act proactively to change this perception by communicating and self-advertising more, in order to increase our scholarly visibility. The conventional approaches are going to conferences or giving invited seminars to institutes. But there are newer avenues, such as social media. Tools like Linkedin, Twitter, blogging, and Youtube videos, and online communities like slack groups, to name a few, are good alternative approaches to share scientific updates, and you do not have to travel to meetings to promote your work. I am a big fan of open science, and a lot of our work is disseminated in the form of preprints before getting published. Once the preprints are available, I tweet it online with URL links and figure snapshots, to engage the research community. You can also use a more formal PR channel such as LinkedIn, which lets users update what is new in their statuses. It is important to have a regularly updated research website that shows visitors any news from your research group, such as publications, awards, and grants. These online tools have really helped to make my work known even though I was living in Hawaii, 2000 miles away from the closest continent. It also helped to keep up with friends, get in touch with new collaborators, attract trainees, and be updated with the newest developments in the field. 5. Challenge and improve yourself. One of the best pieces of advice I got from my faculty mentor was “Do not settle”. The field of Bioinformatics is constantly evolving, closely following after the development of genomics technologies. As a PI, you have to train yourself to be open-minded and willing to learn constantly, both scientifically and administratively. For example, you should actively reach out to more senior (likely male) peers for collaborations. There is nothing to lose to reach out, and often people are willing to help. At conferences or seminars, do not feel shy to ask questions, as we are all there to learn. For the projects, rather than asking yourself what you feel comfortable to do, ask instead what is needed to get the scientific questions answered, and engage all the resources to answer those questions. Since turning myself into a computational researcher, I had not thought that one day, I would run a wet lab, but now I do because we need that component to generate data for our projects. Lastly, I would like to share from my own experience that you should not compare yourself with others, and instead compare yourself from yesterday to today. As long as you keep improving yourself, you will be successful in your own way. 1. Voight BF. Keen on the tenure track job, are you? Know these things, you should. Genome Biol. 2019;20:1–4 BioMed Central. Article Google Scholar 2. Lappalainen T. From trainee to tenure-track: ten tips. Genome Biol. 2015;16:1–3 BioMed Central. Article Google Scholar Download references LXG would like to thank Dr. Barbara Cheifet for the helpful comments on this piece. Funding LXG would like to thank the support by grants K01ES025434 awarded by NIEHS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov), R01 LM012373 and R01 LM012907 awarded by NLM, and R01 HD084633 awarded by NICHD. Affiliations Previous address: University of Hawaii Cancer Center, 701 Ilalo Street, Honolulu, 96801, USA Lana X. Garmire Present address: Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48105, USA Lana X. Garmire Authors Search for Lana X. Garmire in: PubMed • Google Scholar Contributions LXG wrote, reviewed, and approved the final manuscript. Corresponding author Correspondence to Lana X. Garmire. Competing interests The author declares that she has no competing interests. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Reprints and Permissions Cite this article Garmire, L.X. From Hawaii to PECASE award: tips of success from a female bioinformatician. Genome Biol 20, 271 (2019) doi:10.1186/s13059-019-1886-x Download citation Received 11 November 2019 Accepted 11 November 2019 Published 12 December 2019 DOI https://doi.org/10.1186/s13059-019-1886-x

    更新日期:2019-12-13
  • CTCF modulates allele-specific sub-TAD organization and imprinted gene activity at the mouse Dlk1-Dio3 and Igf2-H19 domains
    Genome Biol. (IF 14.028) Pub Date : 2019-12-12
    David Llères; Benoît Moindrot; Rakesh Pathak; Vincent Piras; Mélody Matelot; Benoît Pignard; Alice Marchand; Mallory Poncelet; Aurélien Perrin; Virgile Tellier; Robert Feil; Daan Noordermeer

    Genomic imprinting is essential for mammalian development and provides a unique paradigm to explore intra-cellular differences in chromatin configuration. So far, the detailed allele-specific chromatin organization of imprinted gene domains has mostly been lacking. Here, we explored the chromatin structure of the two conserved imprinted domains controlled by paternal DNA methylation imprints—the Igf2-H19 and Dlk1-Dio3 domains—and assessed the involvement of the insulator protein CTCF in mouse cells. Both imprinted domains are located within overarching topologically associating domains (TADs) that are similar on both parental chromosomes. At each domain, a single differentially methylated region is bound by CTCF on the maternal chromosome only, in addition to multiple instances of bi-allelic CTCF binding. Combinations of allelic 4C-seq and DNA-FISH revealed that bi-allelic CTCF binding alone, on the paternal chromosome, correlates with a first level of sub-TAD structure. On the maternal chromosome, additional CTCF binding at the differentially methylated region adds a further layer of sub-TAD organization, which essentially hijacks the existing paternal-specific sub-TAD organization. Perturbation of maternal-specific CTCF binding site at the Dlk1-Dio3 locus, using genome editing, results in perturbed sub-TAD organization and bi-allelic Dlk1 activation during differentiation. Maternal allele-specific CTCF binding at the imprinted Igf2-H19 and the Dlk1-Dio3 domains adds an additional layer of sub-TAD organization, on top of an existing three-dimensional configuration and prior to imprinted activation of protein-coding genes. We speculate that this allele-specific sub-TAD organization provides an instructive or permissive context for imprinted gene activation during development.

    更新日期:2019-12-13
  • Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis
    Genome Biol. (IF 14.028) Pub Date : 2019-12-10
    Shiquan Sun; Jiaqiang Zhu; Ying Ma; Xiang Zhou

    Dimensionality reduction is an indispensable analytic component for many areas of single-cell RNA sequencing (scRNA-seq) data analysis. Proper dimensionality reduction can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of dimensionality reduction in scRNA-seq analysis and the vast number of dimensionality reduction methods developed for scRNA-seq studies, few comprehensive comparison studies have been performed to evaluate the effectiveness of different dimensionality reduction methods in scRNA-seq. We aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used dimensionality reduction methods for scRNA-seq studies. Specifically, we compare 18 different dimensionality reduction methods on 30 publicly available scRNA-seq datasets that cover a range of sequencing techniques and sample sizes. We evaluate the performance of different dimensionality reduction methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluate the computational scalability of different dimensionality reduction methods by recording their computational cost. Based on the comprehensive evaluation results, we provide important guidelines for choosing dimensionality reduction methods for scRNA-seq data analysis. We also provide all analysis scripts used in the present study at www.xzlab.org/reproduce.html.

    更新日期:2019-12-11
  • The Pseudomonas aeruginosa accessory genome elements influence virulence towards Caenorhabditis elegans
    Genome Biol. (IF 14.028) Pub Date : 2019-12-10
    Alejandro Vasquez-Rifo; Isana Veksler-Lublinsky; Zhenyu Cheng; Frederick M. Ausubel; Victor Ambros

    Multicellular animals and bacteria frequently engage in predator-prey and host-pathogen interactions, such as the well-studied relationship between Pseudomonas aeruginosa and the nematode Caenorhabditis elegans. This study investigates the genomic and genetic basis of bacterial-driven variability in P. aeruginosa virulence towards C. elegans to provide evolutionary insights into host-pathogen relationships. Natural isolates of P. aeruginosa that exhibit diverse genomes display a broad range of virulence towards C. elegans. Using gene association and genetic analysis, we identify accessory genome elements that correlate with virulence, including both known and novel virulence determinants. Among the novel genes, we find a viral-like mobile element, the teg block, that impairs virulence and whose acquisition is restricted by CRISPR-Cas systems. Further genetic and genomic evidence suggests that spacer-targeted elements preferentially associate with lower virulence while the presence of CRISPR-Cas associates with higher virulence. Our analysis demonstrates substantial strain variation in P. aeruginosa virulence, mediated by specific accessory genome elements that promote increased or decreased virulence. We exemplify that viral-like accessory genome elements that decrease virulence can be restricted by bacterial CRISPR-Cas immune defense systems, and suggest a positive, albeit indirect, role for host CRISPR-Cas systems in virulence maintenance.

    更新日期:2019-12-11
  • The majority of A-to-I RNA editing is not required for mammalian homeostasis
    Genome Biol. (IF 14.028) Pub Date : 2019-12-09
    Alistair M. Chalk; Scott Taylor; Jacki E. Heraud-Farlow; Carl R. Walkley

    Adenosine-to-inosine (A-to-I) RNA editing, mediated by ADAR1 and ADAR2, occurs at tens of thousands to millions of sites across mammalian transcriptomes. A-to-I editing can change the protein coding potential of a transcript and alter RNA splicing, miRNA biology, RNA secondary structure and formation of other RNA species. In vivo, the editing-dependent protein recoding of GRIA2 is the essential function of ADAR2, while ADAR1 editing prevents innate immune sensing of endogenous RNAs by MDA5 in both human and mouse. However, a significant proportion of A-to-I editing sites can be edited by both ADAR1 and ADAR2, particularly within the brain where both are highly expressed. The physiological function(s) of these shared sites, including those evolutionarily conserved, is largely unknown. To generate completely A-to-I editing-deficient mammals, we crossed the viable rescued ADAR1-editing-deficient animals (Adar1E861A/E861AIfih1−/−) with rescued ADAR2-deficient (Adarb1−/−Gria2R/R) animals. Unexpectedly, the global absence of editing was well tolerated. Adar1E861A/E861AIfih1−/−Adarb1−/−Gria2R/R were recovered at Mendelian ratios and age normally. Detailed transcriptome analysis demonstrated that editing was absent in the brains of the compound mutants and that ADAR1 and ADAR2 have similar editing site preferences and patterns. We conclude that ADAR1 and ADAR2 are non-redundant and do not compensate for each other’s essential functions in vivo. Physiologically essential A-to-I editing comprises a small subset of the editome, and the majority of editing is dispensable for mammalian homeostasis. Moreover, in vivo biologically essential protein recoding mediated by A-to-I editing is an exception in mammals.

    更新日期:2019-12-09
  • Chromosome-level genome assembly for giant panda provides novel insights into Carnivora chromosome evolution
    Genome Biol. (IF 14.028) Pub Date : 2019-12-06
    Huizhong Fan; Qi Wu; Fuwen Wei; Fengtang Yang; Bee Ling Ng; Yibo Hu

    Chromosome evolution is an important driver of speciation and species evolution. Previous studies have detected chromosome rearrangement events among different Carnivora species using chromosome painting strategies. However, few of these studies have focused on chromosome evolution at a nucleotide resolution due to the limited availability of chromosome-level Carnivora genomes. Although the de novo genome assembly of the giant panda is available, current short read-based assemblies are limited to moderately sized scaffolds, making the study of chromosome evolution difficult. Here, we present a chromosome-level giant panda draft genome with a total size of 2.29 Gb. Based on the giant panda genome and published chromosome-level dog and cat genomes, we conduct six large-scale pairwise synteny alignments and identify evolutionary breakpoint regions. Interestingly, gene functional enrichment analysis shows that for all of the three Carnivora genomes, some genes located in evolutionary breakpoint regions are significantly enriched in pathways or terms related to sensory perception of smell. In addition, we find that the sweet receptor gene TAS1R2, which has been proven to be a pseudogene in the cat genome, is located in an evolutionary breakpoint region of the giant panda, suggesting that interchromosomal rearrangement may play a role in the cat TAS1R2 pseudogenization. We show that the combined strategies employed in this study can be used to generate efficient chromosome-level genome assemblies. Moreover, our comparative genomics analyses provide novel insights into Carnivora chromosome evolution, linking chromosome evolution to functional gene evolution.

    更新日期:2019-12-07
  • Dashing: fast and accurate genomic distances with HyperLogLog
    Genome Biol. (IF 14.028) Pub Date : 2019-12-04
    Daniel N. Baker; Ben Langmead

    Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

    更新日期:2019-12-04
  • Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression
    Genome Biol. (IF 14.028) Pub Date : 2019-12-04
    Kujin Tang; Jie Ren; Fengzhu Sun

    Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.

    更新日期:2019-12-04
  • eIF4A2 drives repression of translation at initiation by Ccr4-Not through purine-rich motifs in the 5′UTR
    Genome Biol. (IF 14.028) Pub Date : 2019-12-02
    Ania Wilczynska; Sarah L. Gillen; Tobias Schmidt; Hedda A. Meijer; Rebekah Jukes-Jones; Claudia Langlais; Kari Kopra; Wei-Ting Lu; Jack D. Godfrey; Benjamin R. Hawley; Kelly Hodge; Sara Zanivan; Kelvin Cain; John Le Quesne; Martin Bushell

    Regulation of the mRNA life cycle is central to gene expression control and determination of cell fate. miRNAs represent a critical mRNA regulatory mechanism, but despite decades of research, their mode of action is still not fully understood. Here, we show that eIF4A2 is a major effector of the repressive miRNA pathway functioning via the Ccr4-Not complex. We demonstrate that while DDX6 interacts with Ccr4-Not, its effects in the mechanism are not as pronounced. Through its interaction with the Ccr4-Not complex, eIF4A2 represses mRNAs at translation initiation. We show evidence that native eIF4A2 has similar RNA selectivity to chemically inhibited eIF4A1. eIF4A2 exerts its repressive effect by binding purine-rich motifs which are enriched in the 5′UTR of target mRNAs directly upstream of the AUG start codon. Our data support a model whereby purine motifs towards the 3′ end of the 5′UTR are associated with increased ribosome occupancy and possible uORF activation upon eIF4A2 binding.

    更新日期:2019-12-02
  • CRISPR-Cas13d mediates robust RNA virus interference in plants
    Genome Biol. (IF 14.028) Pub Date : 2019-12-02
    Ahmed Mahas; Rashid Aman; Magdy Mahfouz

    CRISPR-Cas systems endow bacterial and archaeal species with adaptive immunity mechanisms to fend off invading phages and foreign genetic elements. CRISPR-Cas9 has been harnessed to confer virus interference against DNA viruses in eukaryotes, including plants. In addition, CRISPR-Cas13 systems have been used to target RNA viruses and the transcriptome in mammalian and plant cells. Recently, CRISPR-Cas13a has been shown to confer modest interference against RNA viruses. Here, we characterized a set of different Cas13 variants to identify those with the most efficient, robust, and specific interference activities against RNA viruses in planta using Nicotiana benthamiana. Our data show that LwaCas13a, PspCas13b, and CasRx variants mediate high interference activities against RNA viruses in transient assays. Moreover, CasRx mediated robust interference in both transient and stable overexpression assays when compared to the other variants tested. CasRx targets either one virus alone or two RNA viruses simultaneously, with robust interference efficiencies. In addition, CasRx exhibits strong specificity against the target virus and does not exhibit collateral activity in planta. Our data establish CasRx as the most robust Cas13 variant for RNA virus interference applications in planta and demonstrate its suitability for studying key questions relating to virus biology.

    更新日期:2019-12-02
  • Characterizing the interplay between gene nucleotide composition bias and splicing
    Genome Biol. (IF 14.028) Pub Date : 2019-11-29
    Sébastien Lemaire; Nicolas Fontrodona; Fabien Aubé; Jean-Baptiste Claude; Hélène Polvèche; Laurent Modolo; Cyril F. Bourgeois; Franck Mortreux; Didier Auboeuf

    Nucleotide composition bias plays an important role in the 1D and 3D organization of the human genome. Here, we investigate the potential interplay between nucleotide composition bias and the regulation of exon recognition during splicing. By analyzing dozens of RNA-seq datasets, we identify two groups of splicing factors that activate either about 3200 GC-rich exons or about 4000 AT-rich exons. We show that splicing factor–dependent GC-rich exons have predicted RNA secondary structures at 5′ ss and are dependent on U1 snRNP–associated proteins. In contrast, splicing factor–dependent AT-rich exons have a large number of decoy branch points, SF1- or U2AF2-binding sites and are dependent on U2 snRNP–associated proteins. Nucleotide composition bias also influences local chromatin organization, with consequences for exon recognition during splicing. Interestingly, the GC content of exons correlates with that of their hosting genes, isochores, and topologically associated domains. We propose that regional nucleotide composition bias over several dozens of kilobase pairs leaves a local footprint at the exon level and induces constraints during splicing that can be alleviated by local chromatin organization at the DNA level and recruitment of specific splicing factors at the RNA level. Therefore, nucleotide composition bias establishes a direct link between genome organization and local regulatory processes, like alternative splicing.

    更新日期:2019-11-30
  • ReorientExpress: reference-free orientation of nanopore cDNA reads with deep learning
    Genome Biol. (IF 14.028) Pub Date : 2019-11-29
    Angel Ruiz-Reche; Akanksha Srivastava; Joel A. Indi; Ivan de la Rubia; Eduardo Eyras

    We describe ReorientExpress, a method to perform reference-free orientation of transcriptomic long sequencing reads. ReorientExpress uses deep learning to correctly predict the orientation of the majority of reads, and in particular when trained on a closely related species or in combination with read clustering. ReorientExpress enables long-read transcriptomics in non-model organisms and samples without a genome reference without using additional technologies and is available at https://github.com/comprna/reorientexpress.

    更新日期:2019-11-30
  • methylCC: technology-independent estimation of cell type composition using differentially methylated regions
    Genome Biol. (IF 14.028) Pub Date : 2019-11-29
    Stephanie C. Hicks; Rafael A. Irizarry

    A major challenge in the analysis of DNA methylation (DNAm) data is variability introduced from intra-sample cellular heterogeneity, such as whole blood which is a convolution of DNAm profiles across a unique cell type. When this source of variability is confounded with an outcome of interest, if unaccounted for, false positives ensue. Current methods to estimate the cell type proportions in whole blood DNAm samples are only appropriate for one technology and lead to technology-specific biases if applied to data generated from other technologies. Here, we propose the technology-independent alternative: methylCC, which is available at https://github.com/stephaniehicks/methylCC.

    更新日期:2019-11-30
  • RegSNPs-intron: a computational framework for predicting pathogenic impact of intronic single nucleotide variants
    Genome Biol. (IF 14.028) Pub Date : 2019-11-28
    Hai Lin; Katherine A. Hargreaves; Rudong Li; Jill L. Reiter; Yue Wang; Matthew Mort; David N. Cooper; Yaoqi Zhou; Chi Zhang; Michael T. Eadon; M. Eileen Dolan; Joseph Ipe; Todd C. Skaar; Yunlong Liu

    Single nucleotide variants (SNVs) in intronic regions have yet to be systematically investigated for their disease-causing potential. Using known pathogenic and neutral intronic SNVs (iSNVs) as training data, we develop the RegSNPs-intron algorithm based on a random forest classifier that integrates RNA splicing, protein structure, and evolutionary conservation features. RegSNPs-intron showed excellent performance in evaluating the pathogenic impacts of iSNVs. Using a high-throughput functional reporter assay called ASSET-seq (ASsay for Splicing using ExonTrap and sequencing), we evaluate the impact of RegSNPs-intron predictions on splicing outcome. Together, RegSNPs-intron and ASSET-seq enable effective prioritization of iSNVs for disease pathogenesis.

    更新日期:2019-11-29
  • Common DNA sequence variation influences 3-dimensional conformation of the human genome
    Genome Biol. (IF 14.028) Pub Date : 2019-11-28
    David U. Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony D. Schmitt; Amina Noor; Joshua Chiou; Kyle J. Gaulton; Jonathan Sebat; Yun Li; Kasper D. Hansen; Bing Ren

    The 3-dimensional (3D) conformation of chromatin inside the nucleus is integral to a variety of nuclear processes including transcriptional regulation, DNA replication, and DNA damage repair. Aberrations in 3D chromatin conformation have been implicated in developmental abnormalities and cancer. Despite the importance of 3D chromatin conformation to cellular function and human health, little is known about how 3D chromatin conformation varies in the human population, or whether DNA sequence variation between individuals influences 3D chromatin conformation. To address these questions, we perform Hi-C on lymphoblastoid cell lines from 20 individuals. We identify thousands of regions across the genome where 3D chromatin conformation varies between individuals and find that this variation is often accompanied by variation in gene expression, histone modifications, and transcription factor binding. Moreover, we find that DNA sequence variation influences several features of 3D chromatin conformation including loop strength, contact insulation, contact directionality, and density of local cis contacts. We map hundreds of quantitative trait loci associated with 3D chromatin features and find evidence that some of these same variants are associated at modest levels with other molecular phenotypes as well as complex disease risk. Our results demonstrate that common DNA sequence variants can influence 3D chromatin conformation, pointing to a more pervasive role for 3D chromatin conformation in human phenotypic variation than previously recognized.

    更新日期:2019-11-29
  • The Aquilegia genome reveals a hybrid origin of core eudicots
    Genome Biol. (IF 14.028) Pub Date : 2019-11-28
    Gökçe Aköz; Magnus Nordborg

    Whole-genome duplications (WGDs) have dominated the evolutionary history of plants. One consequence of WGD is a dramatic restructuring of the genome as it undergoes diploidization, a process under which deletions and rearrangements of various sizes scramble the genetic material, leading to a repacking of the genome and eventual return to diploidy. Here, we investigate the history of WGD in the columbine genus Aquilegia, a basal eudicot, and use it to illuminate the origins of the core eudicots. Within-genome synteny confirms that columbines are ancient tetraploids, and comparison with the grape genome reveals that this tetraploidy appears to be shared with the core eudicots. Thus, the ancient gamma hexaploidy found in all core eudicots must have involved a two-step process: first, tetraploidy in the ancestry of all eudicots, then hexaploidy in the ancestry of core eudicots. Furthermore, the precise pattern of synteny sharing suggests that the latter involved allopolyploidization and that core eudicots thus have a hybrid origin. Novel analyses of synteny sharing together with the well-preserved structure of the columbine genome reveal that the gamma hexaploidy at the root of core eudicots is likely a result of hybridization between a tetraploid and a diploid species.

    更新日期:2019-11-29
  • Improved metagenomic analysis with Kraken 2
    Genome Biol. (IF 14.028) Pub Date : 2019-11-28
    Derrick E. Wood; Jennifer Lu; Ben Langmead

    Although Kraken’s k-mer-based approach provides a fast taxonomic classification of metagenomic sequence data, its large memory requirements can be limiting for some applications. Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold. Kraken 2 also introduces a translated search mode, providing increased sensitivity in viral metagenomics analysis.

    更新日期:2019-11-29
  • 547 transcriptomes from 44 brain areas reveal features of the aging brain in non-human primates
    Genome Biol. (IF 14.028) Pub Date : 2019-11-28
    Ming-Li Li; Shi-Hao Wu; Jin-Jin Zhang; Hang-Yu Tian; Yong Shao; Zheng-Bo Wang; David M. Irwin; Jia-Li Li; Xin-Tian Hu; Dong-Dong Wu

    Brain aging is a complex process that depends on the precise regulation of multiple brain regions; however, the underlying molecular mechanisms behind this process remain to be clarified in non-human primates. Here, we explore non-human primate brain aging using 547 transcriptomes originating from 44 brain areas in rhesus macaques (Macaca mulatta). We show that expression connectivity between pairs of cerebral cortex areas as well as expression symmetry between the left and right hemispheres both decrease after aging. Although the aging mechanisms across different brain areas are largely convergent, changes in gene expression and alternative splicing vary at diverse genes, reinforcing the complex multifactorial basis of aging. Through gene co-expression network analysis, we identify nine modules that exhibit gain of connectivity in the aged brain and uncovered a hub gene, PGLS, underlying brain aging. We further confirm the functional significance of PGLS in mice at the gene transcription, molecular, and behavioral levels. Taken together, our study provides comprehensive transcriptomes on multiple brain regions in non-human primates and provides novel insights into the molecular mechanism of healthy brain aging.

    更新日期:2019-11-29
  • Pharmacogenomic analysis of patient-derived tumor cells in gynecologic cancers
    Genome Biol. (IF 14.028) Pub Date : 2019-11-26
    Jason K. Sa; Jae Ryoung Hwang; Young-Jae Cho; Ji-Yoon Ryu; Jung-Joo Choi; Soo Young Jeong; Jihye Kim; Myeong Seon Kim; E. Sun Paik; Yoo-Young Lee; Chel Hun Choi; Tae-Joong Kim; Byoung-Gie Kim; Duk-Soo Bae; Yeri Lee; Nam-Gu Her; Yong Jae Shin; Hee Jin Cho; Ja Yeon Kim; Yun Jee Seo; Harim Koo; Jeong-Woo Oh; Taebum Lee; Hyun-Soo Kim; Sang Yong Song; Joon Seol Bae; Woong-Yang Park; Hee Dong Han; Hyung Jun Ahn; Anil K. Sood; Raul Rabadan; Jin-Ku Lee; Do-Hyun Nam; Jeong-Won Lee

    Gynecologic malignancy is one of the leading causes of mortality in female adults worldwide. Comprehensive genomic analysis has revealed a list of molecular aberrations that are essential to tumorigenesis, progression, and metastasis of gynecologic tumors. However, targeting such alterations has frequently led to treatment failures due to underlying genomic complexity and simultaneous activation of various tumor cell survival pathway molecules. A compilation of molecular characterization of tumors with pharmacological drug response is the next step toward clinical application of patient-tailored treatment regimens. Toward this goal, we establish a library of 139 gynecologic tumors including epithelial ovarian cancers (EOCs), cervical, endometrial tumors, and uterine sarcomas that are genomically and/or pharmacologically annotated and explore dynamic pharmacogenomic associations against 37 molecularly targeted drugs. We discover lineage-specific drug sensitivities based on subcategorization of gynecologic tumors and identify TP53 mutation as a molecular determinant that elicits therapeutic response to poly (ADP-Ribose) polymerase (PARP) inhibitor. We further identify transcriptome expression of inhibitor of DNA biding 2 (ID2) as a potential predictive biomarker for treatment response to olaparib. Together, our results demonstrate the potential utility of rapid drug screening combined with genomic profiling for precision treatment of gynecologic cancers.

    更新日期:2019-11-27
  • DNA methylation aging clocks: challenges and recommendations
    Genome Biol. (IF 14.028) Pub Date : 2019-11-25
    Christopher G. Bell; Robert Lowe; Peter D. Adams; Andrea A. Baccarelli; Stephan Beck; Jordana T. Bell; Brock C. Christensen; Vadim N. Gladyshev; Bastiaan T. Heijmans; Steve Horvath; Trey Ideker; Jean-Pierre J. Issa; Karl T. Kelsey; Riccardo E. Marioni; Wolf Reik; Caroline L. Relton; Leonard C. Schalkwyk; Andrew E. Teschendorff; Wolfgang Wagner; Kang Zhang; Vardhman K. Rakyan

    Epigenetic clocks comprise a set of CpG sites whose DNA methylation levels measure subject age. These clocks are acknowledged as a highly accurate molecular correlate of chronological age in humans and other vertebrates. Also, extensive research is aimed at their potential to quantify biological aging rates and test longevity or rejuvenating interventions. Here, we discuss key challenges to understand clock mechanisms and biomarker utility. This requires dissecting the drivers and regulators of age-related changes in single-cell, tissue- and disease-specific models, as well as exploring other epigenomic marks, longitudinal and diverse population studies, and non-human models. We also highlight important ethical issues in forensic age determination and predicting the trajectory of biological aging in an individual.

    更新日期:2019-11-26
  • Crowdfunding science
    Genome Biol. (IF 14.028) Pub Date : 2019-11-25
    Melissa A. Wilson

    /n /n /n /n /n /n /n /n /n /n /n /n /n /n /n /n /n

    更新日期:2019-11-26
  • MIA-Sig: multiplex chromatin interaction analysis by signal processing and statistical algorithms
    Genome Biol. (IF 14.028) Pub Date : 2019-11-25
    Minji Kim; Meizhen Zheng; Simon Zhongyuan Tian; Byoungkoo Lee; Jeffrey H. Chuang; Yijun Ruan

    The single-molecule multiplex chromatin interaction data are generated by emerging 3D genome mapping technologies such as GAM, SPRITE, and ChIA-Drop. These datasets provide insights into high-dimensional chromatin organization, yet introduce new computational challenges. Thus, we developed MIA-Sig, an algorithmic solution based on signal processing and information theory. We demonstrate its ability to de-noise the multiplex data, assess the statistical significance of chromatin complexes, and identify topological domains and frequent inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains. Together, MIA-Sig represents a novel algorithmic framework for multiplex chromatin interaction analysis.

    更新日期:2019-11-26
  • Gut-derived Enterococcus faecium from ulcerative colitis patients promotes colitis in a genetically susceptible mouse host
    Genome Biol. (IF 14.028) Pub Date : 2019-11-25
    Jun Seishima; Noriho Iida; Kazuya Kitamura; Masahiro Yutani; Ziyu Wang; Akihiro Seki; Taro Yamashita; Yoshio Sakai; Masao Honda; Tatsuya Yamashita; Takashi Kagaya; Yukihiro Shirota; Yukako Fujinaga; Eishiro Mizukoshi; Shuichi Kaneko

    Recent metagenomic analyses have revealed dysbiosis of the gut microbiota of ulcerative colitis (UC) patients. However, the impacts of this dysbiosis are not fully understood, particularly at the strain level. We perform whole-genome shotgun sequencing of fecal DNA extracts from 13 healthy donors and 16 UC and 8 Crohn’s disease (CD) patients. The microbiota of UC and CD patients is taxonomically and functionally divergent from that of healthy donors, with E. faecium being the most differentially abundant species between the two microbial communities. Transplantation of feces from UC or CD patients into Il10−/− mice promotes pathological inflammation and cytokine expression in the mouse colon, although distinct cytokine expression profiles are observed between UC and CD. Unlike isolates derived from healthy donors, E. faecium isolates from the feces of UC patients, along with E. faecium strain ATCC 19434, promotes colitis and colonic cytokine expression. Inflammatory E. faecium strains, including ATCC 19434 and a UC-derived strain, cluster separately from commercially available probiotic strains based on whole-genome shotgun sequencing analysis. The presence of E. faecium in fecal samples is associated with large disease extent and the need for multiple medications in UC patients. E. faecium strains derived from UC patients display an inflammatory genotype that causes colitis.

    更新日期:2019-11-26
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
中国科学院大学楚甲祥
中国科学院微生物研究所潘国辉
中国科学院化学研究所
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug