Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer
Introduction
Oral tongue squamous cell carcinoma (OTSCC) refers to squamous cell carcinoma that arises from the anterior two thirds of the tongue (also known as mobile tongue). It is usually reported as part of oral squamous cell carcinoma (OSCC), which includes all anatomical subsites of the oral cavity. A recent international study including 22 registries reported 89,212 incident cases of OTSCC and an increasing annual incidence [1], which has been confirmed by others [2]. The primary treatment of choice for OTSCC is surgical excision. However, even early-stage tumors may express a pattern of aggressive behavior [3,4]. Thus, OTSCC with aggressive behavior and those with advanced stage require multimodality treatment including neck dissection and adjuvant (chemo)radiotherapy. Therefore, it is important to precisely estimate the clinical behavior and outcome of OTSCC. Predicting the risk of recurrences is one of the important assessments for the clinician during treatment planning. More importantly, early diagnosis and predicting the risk of recurrences form a milestone in the management of OTSCC as the recent analysis of Finnish cases reported that about 67% of OTSCC cases were diagnosed at an early stage (I-II) [5]. With accurate and timely recurrence prediction, high-risk cases of OTSCC can be identified and multimodality treatment applied accordingly. In a large cohort of early OTSCC, about one fourth of cases (27.8%) developed a recurrence, and all of them might have benefitted from early prediction and corresponding treatment planning [6].
Many recent studies have examined the use of machine learning (ML) techniques for prognostication of different cancers [7,8]. Interestingly, predicting patient outcome by ML techniques has shown better accuracy than Cox regression [9]. This is why the use of ML has been in active research focus during recent years. For instance, ML techniques have been used to predict the outcome of various cancer types [[10], [11], [12]] and a web-based tool based on artificial neural network to predict outcome in cancer has been reported [13].
In this study, we examined four different ML algorithms, namely, support vector machine (SVM), naive Bayes (NB), boosted decision tree (BDT), and decision forest (DF) in terms of their performances to predict locoregional recurrence in OTSCC patients. Also, the predictive performance of a permutation feature importance (PFI) of these algorithms was evaluated. Many researchers have used this approach for comparing ML techniques for survival prediction in different malignancies like breast and lung cancers [[14], [15], [16], [17]]. Tapak et al. examined six ML algorithms and two traditional methods for the prediction of breast cancer survival and metastasis [15]. In our study, we aimed to identify the best algorithm that would effectively classify patients as either low-risk or high-risk OTSCC recurrence. The algorithm with the overall best classification performance was further compared to a recently reported risk model based on the depth of invasion (DOI) [18]. This comparison was a result of the fact that DOI of 4 mm or deeper has been considered to be a factor that accurately predicts locoregional recurrence [6]. Moreover, the recent American Joint Committee on Cancer (AJCC) 8th edition incorporated depth of invasion (DOI) into T-stage [19]. Similarly, the study by Almangush et al. suggested that DOI is one of the strongest pathological predictors for locoregional recurrence [6]. This suggestion is in agreement with reports by others [20,21].
We hypothesize that the application of the above-mentioned supervised learning classifiers may be used in the prediction of OTSCC locoregional recurrences and will thereby add value for the management of OTSCC.
Section snippets
Patients
We used data from a study cohort comprising patients treated at the five Finnish University Hospitals of Helsinki, Oulu, Turku, Tampere, and Kuopio and at the A.C. Camargo Cancer Center, Sao Paulo, Brazil. This is a multicenter study from six institutions and data were provided for many cases as locoregional recurrences without specification. The clinicopathologic characteristics of this cohort have been previously reported and summarized [22]. The primary treatment for all cases was surgical
The training-validation phase for the algorithms in Microsoft Azure for prediction of recurrence
Microsoft Azure Machine Learning Studio (Azure ML 2019) was used in this study [27]. The data was preprocessed to handle missing values. The input parameters were age, gender, stage, grade, tumor budding, depth of invasion (DOI), worst pattern of invasion (WPOI), lymphocytic host response (LHR), perineural invasion (PNI) and treatment given, while the target output was locoregional recurrence. Disease-free survival (DFS) time of the cases ranged from 1 to 267 months. Specifically, the DFS in
Data description
The study cohort included 311 patients with cT1-T2cN0M0 OTSCC; 165 men and 146 women, resulting in a male-to-female range of 1.1:1. Out of these 311 cases, 57 cases had missing details about any postoperative treatment information. Therefore, these cases were excluded and the machine learning training was performed with 254 cases. These cases included 141 men and 113 women with the mean age at diagnosis was 61.51 (SD 14.81: range 10–95) and the median age was 62.0 years. The distribution
Discussion
The present study compared the performance of ML algorithms to stratify patients with OTSCC into low or high-recurrence risk group. In this regard, four ML algorithms, namely, boosted decision tree, naive Bayes, support vector machine, and decision forest were examined. We found that the performance of these techniques was higher than that of depth of invasion (DOI) based approach. Our multicenter cohort of cases is one of the largest published series. Majority of the previous publications
Authors contribution
Institutional Coordinators: Salo T, Coletta RD, Kowalski LP, Leivo I, Mäkitie AA, Haglund C. Study concepts and study design: Alabi RO, Elmusrati M, Almangush A, Coletta RD, Salo T, Leivo I. Data acquisition and quality control of data: Sawazaki‐Calone I, Kowalski LP, Leivo I. Data analysis and interpretation: Alabi RO, Elmusrati M, Almangush A, Sawazaki‐Calone I, Mäkitie AA, Salo T, Leivo I. Manuscript preparation: Alabi RO, Elmusrati M, Almangush A, Mäkitie AA, Coletta RD. Manuscript review:
Declaration of Competing Interest
The authors declare no conflicts of interest.
Acknowledgments
We would like to include the funding as follow: The School of Technology and Innovations, University of Vaasa Scholarship Fund. Turku University Hospital Research Fund, Helsinki University Hospital Research Fund, and the Finnish Cancer Society.
References (43)
- et al.
Rising incidence of oral tongue cancer among white men and women in the United States, 1973–2012
Oral Oncol.
(2017) - et al.
Prognostic evaluation of oral tongue cancer: means, markers and perspectives (I)
Oral Oncol.
(2010) - et al.
Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms
Expert Syst. Appl.
(2014) - et al.
Predicting breast cancer survivability: a comparison of three data mining methods
Artif. Intell. Med.
(2005) - et al.
Prediction of lung cancer patient survival via supervised machine learning classification techniques
Int. J. Med. Inform.
(2017) - et al.
Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma
Oral Oncol.
(2019) - et al.
Tumour thickness predicts cervical nodal metastases and survival in early oral tongue cancer
Oral Oncol.
(2003) - et al.
A simple novel prognostic model for early stage oral tongue cancer
Int. J. Oral Maxillofac. Surg.
(2015) - et al.
Prognostic impact of perineural invasion in early stage oral tongue squamous cell carcinoma: results from a prospective randomized trial
Surg. Oncol.
(2018) - et al.
Analysis of clinicopathological risk factors for locoregional recurrence of oral squamous cell carcinoma – retrospective analysis of 517 patients
J. Cranio-Maxillofacial Surg.
(2017)
Predictors for tumor recurrence after primary definitive surgery for oral cancer
J. Oral Maxillofac. Surg.
Recurrence rate and shift in histopathological differentiation of oral squamous cell carcinoma – a long-term retrospective study over a period of 13.5 years
J. Cranio-Maxillofacial Surg.
Head and neck cancer, quality of life, and determinant factors: a novel approach using decision tree analysis
Oral Surg. Oral Med. Oral Pathol. Oral Radiol.
Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma
Cancer Lett.
Changing epidemiology of oral squamous cell carcinoma of the tongue: a global study: changing epidemiology of tongue cancer
Head Neck
Poor prognosis in patients with stage I and II oral tongue squamous cell carcinoma
Cancer
Improved outcomes with oral tongue squamous cell carcinoma in Finland: oral tongue carcinoma in Finland
Head Neck
For early-stage oral tongue cancer, depth of invasion and worst pattern of invasion are the strongest pathological predictors for locoregional recurrence and mortality
Virchows Arch.
Analysis of SEER dataset for breast Cancer diagnosis using C4.5 classification algorithm
Int. J. Adv. Res. Comput. Commun. Eng.
Comparison between artificial neural network and Cox regression model in predicting the survival rate of gastric cancer patients
Biomed. Rep.
Knowledge extraction from prostate cancer data
Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS’06)
Cited by (0)
- 1
The last two authors have equal contributions.