当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework to score the effects of structural variants in health and disease
Genome Research ( IF 7 ) Pub Date : 2022-04-01 , DOI: 10.1101/gr.275995.121
Philip Kleinert 1 , Martin Kircher 1, 2
Affiliation  

Although technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human- and chimpanzee-derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV.

中文翻译:

对结构变异对健康和疾病的影响进行评分的框架

尽管技术进步改进了人类基因组中结构变异 (SV) 的识别,但它们的解释仍然具有挑战性。几种方法利用单独的机械原理,如编码序列的删除或 3D 基因组结构中断。但是,缺少使用广泛可用注释的综合工具。在这里,我们描述了 CADD-SV,这是一种检索和集成大量注释以预测 SV 效果的方法。以前,由于注释的致病或良性 SV 数量少且有偏差,监督学习方法受到限制。我们通过使用替代训练目标,即功能变体的组合注释依赖耗尽 (CADD) 来克服这个问题。我们使用人类和黑猩猩衍生的 SV 作为代理中性,并将它们与匹配的模拟变体进行对比作为代理有害,这种方法已被证明对短序列变体很有效。我们的工具计算各种变体注释的汇总统计数据,并使用随机森林模型来确定有害结构变体的优先级。由此产生的 CADD-SV 评分与已知的致病性和罕见的人群变异相关。我们进一步表明,我们可以优先考虑体细胞癌变体以及已知会影响基因表达的非编码变体。我们提供网站和离线评分工具,方便应用 CADD-SV。我们的工具计算各种变体注释的汇总统计数据,并使用随机森林模型来确定有害结构变体的优先级。由此产生的 CADD-SV 评分与已知的致病性和罕见的人群变异相关。我们进一步表明,我们可以优先考虑体细胞癌变体以及已知会影响基因表达的非编码变体。我们提供网站和离线评分工具,方便应用 CADD-SV。我们的工具计算各种变体注释的汇总统计数据,并使用随机森林模型来确定有害结构变体的优先级。由此产生的 CADD-SV 评分与已知的致病性和罕见的人群变异相关。我们进一步表明,我们可以优先考虑体细胞癌变体以及已知会影响基因表达的非编码变体。我们提供网站和离线评分工具,方便应用 CADD-SV。
更新日期:2022-04-01
down
wechat
bug