当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PAN: Personalized Annotation-Based Networks for the Prediction of Breast Cancer Relapse
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-04-28 , DOI: 10.1109/tcbb.2021.3076422
Thin Nguyen , Samuel Lee , Thomas Quinn , Buu Truong , Xiaomei Li , Truyen Tran , Svetha Venkatesh , Thuc Duy Le

The classification of clinical samples based on gene expression data is an important part of precision medicine. In this manuscript, we show how transforming gene expression data into a set of personalized (sample-specific) networks can allow us to harness existing graph-based methods to improve classifier performance. Existing approaches to personalized gene networks have the limitation that they depend on other samples in the data and must get re-computed whenever a new sample is introduced. Here, we propose a novel method, called Personalized Annotation-based Networks (PAN), that avoids this limitation by using curated annotation databases to transform gene expression data into a graph. Unlike competing methods, PANs are calculated for each sample independent of the population, making it a more efficient way to obtain single-sample networks. Using three breast cancer datasets as a case study, we show that PAN classifiers not only predict cancer relapse better than gene features alone, but also outperform PPI (protein-protein interactions) and population-level graph-based classifiers. This work demonstrates the practical advantages of graph-based classification for high-dimensional genomic data, while offering a new approach to making sample-specific networks. Supplementary information: PAN and the baselines are implemented in Python. Source code and data are available at https://github.com/thinng/PAN .

中文翻译:

PAN:用于预测乳腺癌复发的基于个性化注释的网络

基于基因表达数据的临床样本分类是精准医疗的重要组成部分。在这篇手稿中,我们展示了如何将基因表达数据转换为一组个性化(特定于样本)网络,可以让我们利用现有的基于图的方法来提高分类器性能。现有的个性化基因网络方法的局限性在于它们依赖于数据中的其他样本,并且每当引入新样本时都必须重新计算。在这里,我们提出了一种称为基于个性化注释的网络 (PAN) 的新方法,该方法通过使用精选注释数据库将基因表达数据转换为图形来避免这种限制。与竞争方法不同,PAN 是针对每个样本独立于总体计算的,使其成为获得单样本网络的更有效方法。使用三个乳腺癌数据集作为案例研究,我们表明 PAN 分类器不仅比单独的基因特征更好地预测癌症复发,而且优于 PPI(蛋白质-蛋白质相互作用)和基于人口水平的基于图形的分类器。这项工作展示了基于图形的高维基因组数据分类的实际优势,同时提供了一种制作特定样本网络的新方法。补充信息:PAN 和基线是用 Python 实现的。源代码和数据可在https://github.com/thinng/PAN .
更新日期:2021-04-28
down
wechat
bug