当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Systematic analysis of binding of transcription factors to noncoding variants
Nature ( IF 50.5 ) Pub Date : 2021-01-27 , DOI: 10.1038/s41586-021-03211-0
Jian Yan 1, 2, 3, 4 , Yunjiang Qiu 2, 5 , André M Ribeiro Dos Santos 2, 6 , Yimeng Yin 4, 7 , Yang E Li 2, 8 , Nick Vinckier 9 , Naoki Nariai 9 , Paola Benaglio 9 , Anugraha Raman 2, 5 , Xiaoyu Li 1, 3 , Shicai Fan 9 , Joshua Chiou 9 , Fulin Chen 1 , Kelly A Frazer 9 , Kyle J Gaulton 9 , Maike Sander 8, 9 , Jussi Taipale 4, 7, 10 , Bing Ren 2, 8, 11
Affiliation  

Many sequence variants have been linked to complex human traits and diseases1, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein–DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor–DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.



中文翻译:


转录因子与非编码变体结合的系统分析



许多序列变异与复杂的人类特征和疾病有关1 ,但破译它们的生物学功能仍然具有挑战性,因为它们大多数存在于非编码 DNA 中。在这里,我们使用超高通量多重蛋白质-DNA 结合测定系统地评估了 270 个人类转录因子与人类基因组中 95,886 个非编码变体的结合,称为通过指数富集配体系统进化进行单核苷酸多态性评估(SNP) -SELEX)。由此产生的 8.28 亿个转录因子-DNA 相互作用的测量结果能够在体外估计这些转录因子与每个变体的相对亲和力,并评估当前预测非编码变体对转录因子结合的影响的方法。我们表明,大多数转录因子的位置权重矩阵缺乏足够的预测能力,而当对涉及一组新的 61,020 个独立 SNP-SELEX 实验的结果进行评估时,支持向量机与缺口k聚体表示相结合显示出显着改善的性能序列变体。我们报告了 94 种人类转录因子的高度预测模型,并证明了它们在全基因组关联研究和理解涉及不同人类特征和疾病的分子途径中的实用性。

更新日期:2021-01-27
down
wechat
bug