当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An improved clear cell renal cell carcinoma stage prediction model based on gene sets.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-06-08 , DOI: 10.1186/s12859-020-03543-0
Fangjun Li 1 , Mu Yang 2 , Yunhe Li 1 , Mingqiang Zhang 1 , Wenjuan Wang 2 , Dongfeng Yuan 1 , Dongqi Tang 2
Affiliation  

Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma and accounts for cancer-related deaths. Survival rates are very low when the tumor is discovered in the late-stage. Thus, developing an efficient strategy to stratify patients by the stage of the cancer and inner mechanisms that drive the development and progression of cancers is critical in early prevention and treatment. In this study, we developed new strategies to extract important gene features and trained machine learning-based classifiers to predict stages of ccRCC samples. The novelty of our approach is that (i) We improved the feature preprocessing procedure by binning and coding, and increased the stability of data and robustness of the classification model. (ii) We proposed a joint gene selection algorithm by combining the Fast-Correlation-Based Filter (FCBF) search with the information value, the linear correlation coefficient, and variance inflation factor, and removed irrelevant/redundant features. Then the logistic regression-based feature selection method was used to determine influencing factors. (iii) Classification models were developed using machine learning algorithms. This method is evaluated on RNA expression value of clear cell renal cell carcinoma derived from The Cancer Genome Atlas (TCGA). The results showed that the result on the testing set (accuracy of 81.15% and AUC 0.86) outperformed state-of-the-art models (accuracy of 72.64% and AUC 0.81) and a gene set FJL-set was developed, which contained 23 genes, far less than 64. Furthermore, a gene function analysis was used to explore molecular mechanisms that might affect cancer development. The results suggested that our model can extract more prognostic information, and is worthy of further investigation and validation in order to understand the progression mechanism.

中文翻译:

基于基因集的改进的透明细胞肾细胞癌分期预测模型。

透明细胞肾细胞癌(ccRCC)是肾细胞癌最常见的亚型,占癌症相关死亡的原因。在晚期发现肿瘤时,存活率非常低。因此,在早期的预防和治疗中,制定一种有效的策略以按照癌症的阶段和驱动癌症发展和进展的内部机制对患者进行分层至关重要。在这项研究中,我们开发了提取重要基因特征的新策略,并训练了基于机器学习的分类器来预测ccRCC样品的阶段。我们的方法的新颖性在于:(i)我们通过合并和编码改进了特征预处理程序,并增加了数据的稳定性和分类模型的鲁棒性。(ii)通过结合基于快速相关滤波器(FCBF)的搜索与信息值,线性相关系数和方差膨胀因子,并去除了无关/冗余特征,提出了一种联合基因选择算法。然后采用基于逻辑回归的特征选择方法确定影响因素。(iii)使用机器学习算法开发了分类模型。对源自癌症基因组图谱(TCGA)的透明细胞肾细胞癌的RNA表达值进行评估。结果表明,测试集(准确性为81.15%和AUC 0.86)的结果优于最新模型(准确性为72.64%和AUC 0.81),并且开发了包含23个基因的FJL集基因,远远少于64。此外,基因功能分析被用来探索可能影响癌症发展的分子机制。结果提示我们的模型可以提取更多的预后信息,值得进一步研究和验证,以了解其进展机制。
更新日期:2020-06-08
down
wechat
bug