当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An improved clear cell renal cell carcinoma stage prediction model based on gene sets.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-06-08 , DOI: 10.1186/s12859-020-03543-0
Fangjun Li 1 , Mu Yang 2 , Yunhe Li 1 , Mingqiang Zhang 1 , Wenjuan Wang 2 , Dongfeng Yuan 1 , Dongqi Tang 2
Affiliation  

Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma and accounts for cancer-related deaths. Survival rates are very low when the tumor is discovered in the late-stage. Thus, developing an efficient strategy to stratify patients by the stage of the cancer and inner mechanisms that drive the development and progression of cancers is critical in early prevention and treatment. In this study, we developed new strategies to extract important gene features and trained machine learning-based classifiers to predict stages of ccRCC samples. The novelty of our approach is that (i) We improved the feature preprocessing procedure by binning and coding, and increased the stability of data and robustness of the classification model. (ii) We proposed a joint gene selection algorithm by combining the Fast-Correlation-Based Filter (FCBF) search with the information value, the linear correlation coefficient, and variance inflation factor, and removed irrelevant/redundant features. Then the logistic regression-based feature selection method was used to determine influencing factors. (iii) Classification models were developed using machine learning algorithms. This method is evaluated on RNA expression value of clear cell renal cell carcinoma derived from The Cancer Genome Atlas (TCGA). The results showed that the result on the testing set (accuracy of 81.15% and AUC 0.86) outperformed state-of-the-art models (accuracy of 72.64% and AUC 0.81) and a gene set FJL-set was developed, which contained 23 genes, far less than 64. Furthermore, a gene function analysis was used to explore molecular mechanisms that might affect cancer development. The results suggested that our model can extract more prognostic information, and is worthy of further investigation and validation in order to understand the progression mechanism.

中文翻译:


基于基因集的改进的透明细胞肾细胞癌分期预测模型。



透明细胞肾细胞癌 (ccRCC) 是肾细胞癌最常见的亚型,也是癌症相关死亡的原因。当肿瘤发现到晚期时,存活率非常低。因此,制定有效的策略根据癌症的阶段和驱动癌症发生和进展的内部机制对患者进行分层对于早期预防和治疗至关重要。在这项研究中,我们开发了新的策略来提取重要的基因特征,并训练基于机器学习的分类器来预测 ccRCC 样本的阶段。我们方法的新颖之处在于(i)我们通过分箱和编码改进了特征预处理过程,并提高了数据的稳定性和分类模型的鲁棒性。 (ii)我们提出了一种联合基因选择算法,将快速相关滤波器(FCBF)搜索与信息值、线性相关系数和方差膨胀因子相结合,并去除不相关/冗余特征。然后采用基于逻辑回归的特征选择方法确定影响因素。 (iii) 使用机器学习算法开发分类模型。该方法针对源自癌症基因组图谱 (TCGA) 的透明细胞肾细胞癌的 RNA 表达值进行评估。结果表明,测试集上的结果(准确度为 81.15%,AUC 0.86)优于最先进的模型(准确度为 72.64%,AUC 0.81),并开发了一个基因集 FJL-set,其中包含 23 个基因,远少于64个。此外,基因功能分析被用来探索可能影响癌症发展的分子机制。 结果表明,我们的模型可以提取更多的预后信息,值得进一步研究和验证,以了解进展机制。
更新日期:2020-06-08
down
wechat
bug