当前位置: X-MOL 学术Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comprehensive functional genomic resource and integrative model for the human brain
Science ( IF 56.9 ) Pub Date : 2018-12-13 , DOI: 10.1126/science.aat8464
Daifeng Wang 1, 2, 3 , Shuang Liu 1, 2 , Jonathan Warrell 1, 2 , Hyejung Won 4, 5 , Xu Shi 1, 2 , Fabio C. P. Navarro 1, 2 , Declan Clarke 1, 2 , Mengting Gu 1 , Prashant Emani 1, 2 , Yucheng T. Yang 1, 2 , Min Xu 1, 2 , Michael J. Gandal 6 , Shaoke Lou 1, 2 , Jing Zhang 1, 2 , Jonathan J. Park 1, 2 , Chengfei Yan 1, 2 , Suhn Kyong Rhie 7 , Kasidet Manakongtreecheep 1, 2 , Holly Zhou 1, 2 , Aparna Nathan 1, 2 , Mette Peters 8 , Eugenio Mattei 9 , Dominic Fitzgerald 10 , Tonya Brunetti 10 , Jill Moore 9 , Yan Jiang 11 , Kiran Girdhar 12 , Gabriel E. Hoffman 12 , Selim Kalayci 12 , Zeynep H. Gümüş 12 , Gregory E. Crawford 13 , Panos Roussos 11, 12 , Schahram Akbarian 11, 14 , Andrew E. Jaffe 15 , Kevin P. White 10, 16 , Zhiping Weng 9 , Nenad Sestan 17 , Daniel H. Geschwind 18, 19, 20 , James A. Knowles 21 , Mark B. Gerstein 1, 2, 22, 23 , Allison E. Ashley-Koch , Gregory E. Crawford , Melanie E. Garrett , Lingyun Song , Alexias Safi , Graham D. Johnson , Gregory A. Wray , Timothy E Reddy , Fernando S. Goes , Peter Zandi , Julien Bryois , Andrew E. Jaffe , Amanda J. Price , Nikolay A. Ivanov , Leonardo Collado-Torres , Thomas M. Hyde , Emily E. Burke , Joel E. Kleiman , Ran Tao , Joo Heon Shin , Schahram Akbarian , Kiran Girdhar , Yan Jiang , Marija Kundakovic , Leanne Brown , Bibi S. Kassim , Royce B. Park , Jennifer R Wiseman , Elizabeth Zharovsky , Rivka Jacobov , Olivia Devillers , Elie Flatow , Gabriel E. Hoffman , Barbara K. Lipska , David A. Lewis , Vahram Haroutunian , Chang-Gyu Hahn , Alexander W. Charney , Stella Dracheva , Alexey Kozlenkov , Judson Belmont , Diane DelValle , Nancy Francoeur , Evi Hadjimichael , Dalila Pinto , Harm van Bakel , Panos Roussos , John F. Fullard , Jaroslav Bendl , Mads E. Hauberg , Lara M Mangravite , Mette A. Peters , Yooree Chae , Junmin Peng , Mingming Niu , Xusheng Wang , Maree J. Webster , Thomas G. Beach , Chao Chen , Yi Jiang , Rujia Dai , Annie W. Shieh , Chunyu Liu , Kay S. Grennan , Yan Xia , Ramu Vadukapuram , Yongjun Wang , Dominic Fitzgerald , Lijun Cheng , Miguel Brown , Mimi Brown , Tonya Brunetti , Thomas Goodman , Majd Alsayed , Michael J. Gandal , Daniel H. Geschwind , Hyejung Won , Damon Polioudakis , Brie Wamsley , Jiani Yin , Tarik Hadzic , Luis De La Torre Ubieta , Vivek Swarup , Stephan J. Sanders , Matthew W. State , Donna M. Werling , Joon-Yong An , Brooke Sheppard , A. Jeremy Willsey , Kevin P. White , Mohana Ray , Gina Giase , Amira Kefi , Eugenio Mattei , Michael Purcaro , Zhiping Weng , Jill Moore , Henry Pratt , Jack Huey , Tyler Borrman , Patrick F. Sullivan , Paola Giusti-Rodriguez , Yunjung Kim , Patrick Sullivan , Jin Szatkiewicz , Suhn Kyong Rhie , Christoper Armoskus , Adrian Camarena , Peggy J. Farnham , Valeria N. Spitsyna , Heather Witt , Shannon Schreiner , Oleg V. Evgrafov , James A. Knowles , Mark Gerstein , Shuang Liu , Daifeng Wang , Fabio C. P. Navarro , Jonathan Warrell , Declan Clarke , Prashant S. Emani , Mengting Gu , Xu Shi , Min Xu , Yucheng T. Yang , Robert R. Kitchen , Gamze Gürsoy , Jing Zhang , Becky C. Carlyle , Angus C. Nairn , Mingfeng Li , Sirisha Pochareddy , Nenad Sestan , Mario Skarica , Zhen Li , Andre M. M. Sousa , Gabriel Santpere , Jinmyung Choi , Ying Zhu , Tianliuyun Gao , Daniel J. Miller , Adriana Cherskov , Mo Yang , Anahita Amiri , Gianfilippo Coppola , Jessica Mariani , Soraya Scuderi , Anna Szekely , Flora M. Vaccarino , Feinan Wu , Sherman Weissman , Tanmoy Roychowdhury , Alexej Abyzov ,
Affiliation  

INTRODUCTION Strong genetic associations have been found for a number of psychiatric disorders. However, understanding the underlying molecular mechanisms remains challenging. RATIONALE To address this challenge, the PsychENCODE Consortium has developed a comprehensive online resource and integrative models for the functional genomics of the human brain. RESULTS The base of the pyramidal resource is the datasets generated by PsychENCODE, including bulk transcriptome, chromatin, genotype, and Hi-C datasets and single-cell transcriptomic data from ~32,000 cells for major brain regions. We have merged these with data from Genotype-Tissue Expression (GTEx), ENCODE, Roadmap Epigenomics, and single-cell analyses. Via uniform processing, we created a harmonized resource, allowing us to survey functional genomics data on the brain over a sample size of 1866 individuals. From this uniformly processed dataset, we created derived data products. These include lists of brain-expressed genes, coexpression modules, and single-cell expression profiles for many brain cell types; ~79,000 brain-active enhancers with associated Hi-C loops and topologically associating domains; and ~2.5 million expression quantitative-trait loci (QTLs) comprising ~238,000 linkage-disequilibrium–independent single-nucleotide polymorphisms and of other types of QTLs associated with splice isoforms, cell fractions, and chromatin activity. By using these, we found that >88% of the cross-population variation in brain gene expression can be accounted for by cell fraction changes. Furthermore, a number of disorders and aging are associated with changes in cell-type proportions. The derived data also enable comparison between the brain and other tissues. In particular, by using spectral analyses, we found that the brain has distinct expression and epigenetic patterns, including a greater extent of noncoding transcription than other tissues. The top level of the resource consists of integrative networks for regulation and machine-learning models for disease prediction. The networks include a full gene regulatory network (GRN) for the brain, linking transcription factors, enhancers, and target genes from merging of the QTLs, generalized element-activity correlations, and Hi-C data. By using this network, we link disease genes to genome-wide association study (GWAS) variants for psychiatric disorders. For schizophrenia, we linked 321 genes to the 142 reported GWAS loci. We then embedded the regulatory network into a deep-learning model to predict psychiatric phenotypes from genotype and expression. Our model gives a ~6-fold improvement in prediction over additive polygenic risk scores. Moreover, it achieves a ~3-fold improvement over additive models, even when the gene expression data are imputed, highlighting the value of having just a small amount of transcriptome data for disease prediction. Lastly, it highlights key genes and pathways associated with disorder prediction, including immunological, synaptic, and metabolic pathways, recapitulating de novo results from more targeted analyses. CONCLUSION Our resource and integrative analyses have uncovered genomic elements and networks in the brain, which in turn have provided insight into the molecular mechanisms underlying psychiatric disorders. Our deep-learning model improves disease risk prediction over traditional approaches and can be extended with additional data types (e.g., microRNA and neuroimaging). A comprehensive functional genomic resource for the adult human brain. The resource forms a three-layer pyramid. The bottom layer includes sequencing datasets for traits, such as schizophrenia. The middle layer represents derived datasets, including functional genomic elements and QTLs. The top layer contains integrated models, which link genotypes to phenotypes. DSPN, Deep Structured Phenotype Network; PC1 and PC2, principal components 1 and 2; ref, reference; alt, alternate; H3K27ac, histone H3 acetylation at lysine 27. Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.

中文翻译:

人脑综合功能基因组资源和综合模型

引言 已经发现许多精神疾病具有很强的遗传关联。然而,了解潜在的分子机制仍然具有挑战性。基本原理为了应对这一挑战,PsychENCODE 联盟开发了一个全面的在线资源和人脑功能基因组学的综合模型。结果金字塔资源的基础是 PsychENCODE 生成的数据集,包括大量转录组、染色质、基因型和 Hi-C 数据集以及来自主要大脑区域约 32,000 个细胞的单细胞转录组数据。我们已将这些与来自基因型组织表达 (GTEx)、编码、路线图表观基因组学和单细胞分析的数据合并。通过统一处理,我们创造了一个统一的资源,允许我们在 1866 个人的样本量中调查大脑的功能基因组学数据。从这个统一处理的数据集,我们创建了派生的数据产品。其中包括许多脑细胞类型的脑表达基因列表、共表达模块和单细胞表达谱;约 79,000 个大脑活性增强剂,具有相关的 Hi-C 环和拓扑关联域;约 250 万个表达数量性状位点 (QTL),包括约 238,000 个连锁不平衡独立的单核苷酸多态性和与剪接亚型、细胞组分和染色质活性相关的其他类型 QTL。通过使用这些,我们发现 > 88% 的大脑基因表达的跨群体变异可以由细胞分数变化来解释。此外,许多疾病和衰老都与细胞类型比例的变化有关。导出的数据还可以比较大脑和其他组织。特别是,通过使用光谱分析,我们发现大脑具有不同的表达和表观遗传模式,包括比其他组织更大程度的非编码转录。资源的顶层包括用于调节的集成网络和用于疾病预测的机器学习模型。这些网络包括一个完整的大脑基因调控网络 (GRN),连接转录因子、增强子和来自 QTL 合并的靶基因、广义元素-活性相关性和 Hi-C 数据。通过使用该网络,我们将疾病基因与精神疾病的全基因组关联研究 (GWAS) 变体联系起来。对于精神分裂症,我们将 321 个基因与 142 个报道的 GWAS 基因座相关联。然后,我们将调节网络嵌入到深度学习模型中,以从基因型和表达预测精神病表型。我们的模型在预测方面比加性多基因风险评分提高了约 6 倍。此外,它比加性模型实现了约 3 倍的改进,即使在基因表达数据被估算的情况下,也突出了仅使用少量转录组数据进行疾病预测的价值。最后,它强调了与疾病预测相关的关键基因和通路,包括免疫、突触和代谢通路,概括了更有针对性的分析的从头结果。结论我们的资源和综合分析揭示了大脑中的基因组元素和网络,这反过来又深入了解了精神疾病的分子机制。我们的深度学习模型比传统方法改进了疾病风险预测,并且可以扩展其他数据类型(例如,microRNA 和神经成像)。成人大脑的综合功能基因组资源。资源形成三层金字塔。底层包括特征的测序数据集,例如精神分裂症。中间层代表衍生数据集,包括功能基因组元件和 QTL。顶层包含集成模型,将基因型与表型联系起来。DSPN,深度结构化表型网络;PC1和PC2,主成分1和2;参考,参考;alt,替代;H3K27ac,赖氨酸 27 处的组蛋白 H3 乙酰化。尽管在定义精神疾病的遗传风险方面取得了进展,但其分子机制仍然难以捉摸。为了解决这个问题,PsychENCODE 联盟为 1866 个人的成人大脑生成了一个全面的在线资源。PsychENCODE 资源包含约 79,000 个大脑活性增强子、一组 Hi-C 链接和拓扑关联域;许多细胞类型的单细胞表达谱;表达数量性状基因座(QTL);以及与染色质、剪接和细胞类型比例相关的其他 QTL。积分显示,不同的细胞类型比例在很大程度上解释了表达的跨群体变异(具有 >88% 的重建准确度)。它还允许构建基因调控网络,将全基因组关联研究变体与基因(例如,精神分裂症的 321)联系起来。
更新日期:2018-12-13
down
wechat
bug