当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2021-05-03 , DOI: 10.1002/gepi.22383
Kathleen D Askland 1 , David Strong 2 , Marvin N Wright 3 , Jason H Moore 4
Affiliation  

The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case–control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.

中文翻译:

转化机:一种阐明复杂遗传结构的新型机器学习方法

Translational Machine (TM) 是一种基于机器学习 (ML) 的分析管道,可将基因型/变体调用数据转换为生物学背景化特征,这些特征丰富地表征复杂变体架构,并允许更好的可解释性和生物学复制。它还减少了人口子结构对结果预测的潜在混杂影响。TM 由三个主要部分组成。首先,可复制但灵活的特征工程程序将基因组规模的数据转化为生物学信息特征,这些特征在生物学和功能背景下适当地将简单的变异调用/基因型背景化。二、无模型、非参数基于ML的特征过滤程序凭经验减少原始基因型调用和工程特征的维数和噪声。三、强大的ML算法进行特征选择用于区分变异频率和功能预测谱中的风险变异贡献。TM 同时评估在遗传结构的多基因和异质模型下可操作的变体的潜在贡献。我们的 TM 能够在类似于基于基因集/通路和折叠方法的概念框架内集成生物信息(例如,基因组注释),但克服了这些方法的一些局限性。完整的 TM 管道在 R 中执行。我们的方法和从其应用于全外显子组精神分裂症病例对照数据集的初步发现被呈现。这些 TM 程序扩展了初步调查的结果并产生了新的结果。
更新日期:2021-06-23
down
wechat
bug