当前位置: X-MOL 学术Am. J. Med. Genet. B Neuropsychiatr. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving the classification of neuropsychiatric conditions using gene ontology terms as features.
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics ( IF 1.6 ) Pub Date : 2019-04-25 , DOI: 10.1002/ajmg.b.32727
Thomas P Quinn 1, 2, 3 , Samuel C Lee 1 , Svetha Venkatesh 1 , Thin Nguyen 1
Affiliation  

Although neuropsychiatric disorders have an established genetic background, their molecular foundations remain elusive. This has prompted many investigators to search for explanatory biomarkers that can predict clinical outcomes. One approach uses machine learning to classify patients based on blood mRNA expression. However, these endeavors typically fail to achieve the high level of performance, stability, and generalizability required for clinical translation. Moreover, these classifiers can lack interpretability because not all genes have relevance to researchers. For this study, we hypothesized that annotation-based classifiers can improve classification performance, stability, generalizability, and interpretability. To this end, we evaluated the models of four classification algorithms on six neuropsychiatric data sets using four annotation databases. Our results suggest that the Gene Ontology Biological Process database can transform gene expression into an annotation-based feature space that is accurate and stable. We also show how annotation features can improve the interpretability of classifiers: as annotations are used to assign biological importance to genes, the biological importance of annotation-based features are the features themselves. In evaluating the annotation features, we find that top ranked annotations tend contain top ranked genes, suggesting that the most predictive annotations are a superset of the most predictive genes. Based on this, and the fact that annotations are used routinely to assign biological importance to genetic data, we recommend transforming gene-level expression into annotation-level expression prior to the classification of neuropsychiatric conditions.

中文翻译:

使用基因本体术语作为特征来改善神经精神疾病的分类。

尽管神经精神疾病具有确定的遗传背景,但其分子基础仍然难以捉摸。这促使许多研究人员寻找可以预测临床结果的解释性生物标志物。一种方法是使用机器学习根据血液mRNA表达对患者进行分类。但是,这些努力通常无法达到临床翻译所需的高水平的性能,稳定性和通用性。此外,这些分类器可能缺乏解释性,因为并非所有基因都与研究人员相关。对于本研究,我们假设基于注释的分类器可以提高分类性能,稳定性,可概括性和可解释性。为此,我们使用四个注释数据库对六个神经精神病学数据集评估了四种分类算法的模型。我们的结果表明,基因本体生物学过程数据库可以将基因表达转换为准确,稳定的基于注释的特征空间。我们还展示了注释特征如何改善分类器的可解释性:由于使用注释为基因赋予生物学重要性,因此基于注释的特征的生物学重要性即为特征本身。在评估注释特征时,我们发现排名靠前的注释往往包含排名靠前的基因,这表明预测性最强的注释是预测性最强的基因的超集。基于此,以及通常使用注释为遗传数据赋予生物学重要性的事实,
更新日期:2019-11-01
down
wechat
bug