当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A feature-fusion framework of clinical, genomics, and histopathological data for METABRIC breast cancer subtype classification
Applied Soft Computing ( IF 7.2 ) Pub Date : 2020-03-24 , DOI: 10.1016/j.asoc.2020.106238
Ala’a El-Nabawy , Nashwa El-Bendary , Nahla A. Belal

Breast cancer is the most common cancer type attacking women worldwide. Also, breast cancer has been phenotypically classified into five subtypes. Each subtype group has unique characteristics that demonstrate the heterogeneity present within the breast cancer tumour. In 2012, the American Association for Cancer Research provided a population based molecular integrative clusters for the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) dataset, resulting in ten subtypes. Previous work on the METABRIC dataset used only gene expression data to figure out the effective genes for each subtype, without applying integration to benefit from all data sources. The objective of this paper is to present a breast cancer subtype classification model that applies feature fusion on the METABRIC datasets, namely clinical, gene expression, Copy Number Aberrations (CNA), Copy Number Variations (CNV), and histopathological images. State-of-the-art machine learning classifiers were applied on different data profiles, including Linear-SVM, Radial-SVM, Random Forests (RF), Ensemble SVM (E-SVM), and Boosting. The highest accuracy achieved for IntClust subtyping was 88.36% using Linear-SVM, applied on the data profile with features fused from the clinical, gene expression, CNA, and CNV datasets, with a Jaccard and Dice scores of 0.802 and 0.8835, respectively. On the other hand, for the Pam50 subtyping, an accuracy of 97.1% was achieved, Jaccard score ranging from 0.9439 to 0.9472, and Dice score of 0.971, using Linear-SVM and E-SVM classifiers, with several data profiles that include features from histopathological images. Conclusively, the significance of our study is to validate that using feature fusion from various METABRIC datasets improves breast cancer subtypes classification performance. Moreover, histopathological images give promising results on Pam50 subtypes, and it is expected to improve the accuracy for IntClust subtyping when applied on a higher population.



中文翻译:

METABRIC乳腺癌亚型分类的临床,基因组学和组织病理学数据的特征融合框架

乳腺癌是全世界攻击妇女的最常见癌症。而且,乳腺癌在表型上已被分为五个亚型。每个亚型组都有独特的特征,这些特征证明了乳腺癌肿瘤中存在异质性。2012年,美国癌症研究协会为METABRIC(乳腺癌国际分类的分子分类法)数据集提供了基于人群的分子整合簇,从而产生了10个亚型。以前在METABRIC数据集上的工作仅使用基因表达数据来找出每种亚型的有效基因,而没有应用整合以从所有数据源中受益。本文的目的是提出一种将特征融合应用于METABRIC数据集的乳腺癌亚型分类模型,即临床,基因表达,拷贝数畸变(CNA),拷贝数变异(CNV)和组织病理学图像。最新的机器学习分类器已应用于不同的数据配置文件,包括线性SVM,径向SVM,随机森林(RF),集成SVM(E-SVM)和Boosting。使用Linear-SVM,IntClust子类型化的最高准确度为88.36%,应用于具有与临床,基因表达,CNA和CNV数据集融合的特征的数据配置文件,Jaccard和Dice得分分别为0.802和0.8835。另一方面,对于Pam50子类型化,使用Linear-SVM和E-SVM分类器可实现97.1%的准确度,Jaccard得分在0.9439至0.9472之间以及Dice得分在0.971之间,并且多个数据配置文件包括组织病理学图像。最后,我们研究的意义在于验证使用来自各种METABRIC数据集的特征融合可以改善乳腺癌亚型的分类性能。此外,组织病理学图像在Pam50亚型上提供了可喜的结果,并有望在应用于更高人群时提高IntClust亚型的准确性。

更新日期:2020-03-24
down
wechat
bug