当前位置: X-MOL 学术Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder
Science ( IF 44.7 ) Pub Date : 2018-12-13 , DOI: 10.1126/science.aat6576
Joon-Yong An 1 , Kevin Lin 2 , Lingxue Zhu 2 , Donna M Werling 1 , Shan Dong 1 , Harrison Brand 3, 4, 5 , Harold Z Wang 3 , Xuefang Zhao 3, 4, 5 , Grace B Schwartz 1 , Ryan L Collins 3, 4, 6 , Benjamin B Currall 3, 4, 5 , Claudia Dastmalchi 1 , Jeanselle Dea 1 , Clif Duhn 1 , Michael C Gilson 1 , Lambertus Klei 7 , Lindsay Liang 1 , Eirene Markenscoff-Papadimitriou 1 , Sirisha Pochareddy 8 , Nadav Ahituv 9, 10 , Joseph D Buxbaum 11, 12, 13, 14 , Hilary Coon 15, 16 , Mark J Daly 5, 17, 18 , Young Shin Kim 1 , Gabor T Marth 19, 20 , Benjamin M Neale 5, 17, 18 , Aaron R Quinlan 16, 19, 20 , John L Rubenstein 1 , Nenad Sestan 8 , Matthew W State 1, 10 , A Jeremy Willsey 1, 21, 22 , Michael E Talkowski 3, 4, 5, 23 , Bernie Devlin 7 , Kathryn Roeder 2, 24 , Stephan J Sanders 1, 10
Affiliation  

INTRODUCTION The DNA of protein-coding genes is transcribed into mRNA, which is translated into proteins. The “coding genome” describes the DNA that contains the information to make these proteins and represents ~1.5% of the human genome. Newly arising de novo mutations (variants observed in a child but not in either parent) in the coding genome contribute to numerous childhood developmental disorders, including autism spectrum disorder (ASD). Discovery of these effects is aided by the triplet code that enables the functional impact of many mutations to be readily deciphered. In contrast, the “noncoding genome” covers the remaining ~98.5% and includes elements that regulate when, where, and to what degree protein-coding genes are transcribed. Understanding this noncoding sequence could provide insights into human disorders and refined control of emerging genetic therapies. Yet little is known about the role of mutations in noncoding regions, including whether they contribute to childhood developmental disorders, which noncoding elements are most vulnerable to disruption, and the manner in which information is encoded in the noncoding genome. RATIONALE Whole-genome sequencing (WGS) provides the opportunity to identify the majority of genetic variation in each individual. By performing WGS on 1902 quartet families including a child affected with ASD, one unaffected sibling control, and their parents, we identified ~67 de novo mutations across each child’s genome. To characterize the functional role of these mutations, we integrated multiple datasets relating to gene function, genes implicated in neurodevelopmental disorders, conservation across species, and epigenetic markers, thereby combinatorially defining 55,143 categories. The scope of the problem—testing for an excess of de novo mutations in cases relative to controls for each category—is challenging because there are more categories than families. RESULTS Comparing cases to controls, we observed an excess of de novo mutations in cases in individual categories in the coding genome but not in the noncoding genome. To overcome the challenge of detecting noncoding association, we used machine learning tools to develop a de novo risk score to look for an excess of de novo mutations across multiple categories. This score demonstrated a contribution to ASD risk from coding mutations and a weaker, but significant, contribution from noncoding mutations. This noncoding signal was driven by mutations in the promoter region, defined as the 2000 nucleotides upstream of the transcription start site (TSS) where mRNA synthesis starts. The strongest promoter signals were defined by conservation across species and transcription factor binding sites. Well-defined promoter elements (e.g., TATA-box) are usually observed within 80 nucleotides of the TSS; however, the strongest ASD association was observed distally, 750 to 2000 nucleotides upstream of the TSS. CONCLUSION We conclude that de novo mutations in the noncoding genome contribute to ASD. The clearest evidence of noncoding ASD association came from mutations at evolutionarily conserved nucleotides in the promoter region. The enrichment for transcription factor binding sites, primarily in the distal promoter, suggests that these mutations may disrupt gene transcription via their interaction with enhancer elements in the promoter region, rather than interfering with transcriptional initiation directly. Promoter regions in autism. De novo mutations from 1902 quartet families are assigned to 55,143 annotation categories, which are each assessed for autism spectrum disorder (ASD) association by comparing mutation counts in cases and sibling controls. A de novo risk score demonstrated a noncoding contribution to ASD driven by promoter mutations, especially at sites conserved across species, in the distal promoter or targeted by transcription factors. Whole-genome sequencing (WGS) has facilitated the first genome-wide evaluations of the contribution of de novo noncoding mutations to complex disorders. Using WGS, we identified 255,106 de novo mutations among sample genomes from members of 1902 quartet families in which one child, but not a sibling or their parents, was affected by autism spectrum disorder (ASD). In contrast to coding mutations, no noncoding functional annotation category, analyzed in isolation, was significantly associated with ASD. Casting noncoding variation in the context of a de novo risk score across multiple annotation categories, however, did demonstrate association with mutations localized to promoter regions. We found that the strongest driver of this promoter signal emanates from evolutionarily conserved transcription factor binding sites distal to the transcription start site. These data suggest that de novo mutations in promoter regions, characterized by evolutionary and functional signatures, contribute to ASD.

中文翻译:


全基因组从头风险评分暗示自闭症谱系障碍的启动子变异



简介 蛋白质编码基因的 DNA 转录为 mRNA,然后翻译为蛋白质。 “编码基因组”描述了包含制造这些蛋白质的信息的 DNA,约占人类基因组的 1.5%。编码基因组中新出现的从头突变(在儿童中观察到的变异,但在父母中均未观察到的变异)会导致许多儿童发育障碍,包括自闭症谱系障碍(ASD)。三联体密码有助于发现这些效应,使许多突变的功能影响能够轻松破译。相比之下,“非编码基因组”覆盖了剩余的约 98.5%,并包含调节蛋白质编码基因转录的时间、地点和程度的元件。了解这种非编码序列可以深入了解人类疾病和新兴基因疗法的精细控制。然而,人们对非编码区突变的作用知之甚少,包括它们是否会导致儿童发育障碍、哪些非编码元件最容易受到破坏,以及信息在非编码基因组中编码的方式。基本原理 全基因组测序 (WGS) 提供了识别每个个体的大部分遗传变异的机会。通过对 1902 个四重家庭(包括一名患有自闭症谱系障碍的儿童、一名未受影响的兄弟姐妹对照及其父母)进行全基因组测序,我们在每个孩子的基因组中发现了约 67 个从头突变。为了表征这些突变的功能作用,我们整合了与基因功能、神经发育障碍相关基因、跨物种保守性和表观遗传标记相关的多个数据集,从而组合定义了 55,143 个类别。 问题的范围——测试病例中相对于每个类别的对照是否存在过量的新生突变——是具有挑战性的,因为类别比家庭更多。结果将病例与对照进行比较,我们观察到编码基因组中各个类别的病例中有过多的从头突变,但非编码基因组中没有。为了克服检测非编码关联的挑战,我们使用机器学习工具来开发从头风险评分,以寻找跨多个类别的过量从头突变。该评分表明编码突变对自闭症谱系障碍风险有贡献,而非编码突变对自闭症谱系障碍风险的贡献较弱但很重要。这种非编码信号是由启动子区域的突变驱动的,启动子区域定义为转录起始位点 (TSS) 上游 2000 个核苷酸,mRNA 合成在此开始。最强的启动子信号是通过跨物种和转录因子结合位点的保守性来定义的。通常在 TSS 的 80 个核苷酸内观察到明确定义的启动子元件(例如 TATA-box);然而,在 TSS 上游 750 至 2000 个核苷酸的远端观察到最强的 ASD 关联。结论 我们得出结论,非编码基因组中的从头突变导致 ASD。非编码 ASD 关联的最清晰证据来自于启动子区域进化上保守的核苷酸的突变。转录因子结合位点的富集(主要在远端启动子中)表明这些突变可能通过与启动子区域中的增强子元件相互作用来破坏基因转录,而不是直接干扰转录起始。自闭症的启动子区域。 来自 1902 个四重奏家庭的新生突变被分配到 55,143 个注释类别,通过比较病例和兄弟姐妹对照中的突变计数来评估每个类别与自闭症谱系障碍 (ASD) 的关联。从头风险评分表明,启动子突变驱动对 ASD 的非编码贡献,特别是在跨物种保守的位点、远端启动子或转录因子靶向的位点。全基因组测序(WGS)促进了首次对从头非编码突变对复杂疾病的影响进行全基因组评估。使用全基因组测序,我们在 1902 个四重家庭成员的样本基因组中发现了 255,106 个从头突变,其中一个孩子(而非兄弟姐妹或其父母)患有自闭症谱系障碍 (ASD)。与编码突变相反,单独分析的非编码功能注释类别与 ASD 没有显着相关。然而,在跨多个注释类别的从头风险评分的背景下铸造非编码变异确实证明了与定位于启动子区域的突变的关联。我们发现该启动子信号的最强驱动力来自转录起始位点远端进化上保守的转录因子结合位点。这些数据表明,以进化和功能特征为特征的启动子区域的从头突变有助于自闭症谱系障碍(ASD)。
更新日期:2018-12-13
down
wechat
bug