Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer

https://doi.org/10.1016/j.ijmedinf.2019.104068Get rights and content

Abstract

Background

The proper estimate of the risk of recurrences in early-stage oral tongue squamous cell carcinoma (OTSCC) is mandatory for individual treatment-decision making. However, this remains a challenge even for experienced multidisciplinary centers.

Objectives

We compared the performance of four machine learning (ML) algorithms for predicting the risk of locoregional recurrences in patients with OTSCC. These algorithms were Support Vector Machine (SVM), Naive Bayes (NB), Boosted Decision Tree (BDT), and Decision Forest (DF).

Materials and methods

The study cohort comprised 311 cases from the five University Hospitals in Finland and A.C. Camargo Cancer Center, São Paulo, Brazil. For comparison of the algorithms, we used the harmonic mean of precision and recall called F1 score, specificity, and accuracy values. These algorithms and their corresponding permutation feature importance (PFI) with the input parameters were externally tested on 59 new cases. Furthermore, we compared the performance of the algorithm that showed the highest prediction accuracy with the prognostic significance of depth of invasion (DOI).

Results

The results showed that the average specificity of all the algorithms was 71% . The SVM showed an accuracy of 68% and F1 score of 0.63, NB an accuracy of 70% and F1 score of 0.64, BDT an accuracy of 81% and F1 score of 0.78, and DF an accuracy of 78% and F1 score of 0.70. Additionally, these algorithms outperformed the DOI-based approach, which gave an accuracy of 63%. With PFI-analysis, there was no significant difference in the overall accuracies of three of the algorithms; PFI-BDT accuracy increased to 83.1%, PFI-DF increased to 80%, PFI-SVM decreased to 64.4%, while PFI-NB accuracy increased significantly to 81.4%.

Conclusions

Our findings show that the best classification accuracy was achieved with the boosted decision tree algorithm. Additionally, these algorithms outperformed the DOI-based approach. Furthermore, with few parameters identified in the PFI analysis, ML technique still showed the ability to predict locoregional recurrence. The application of boosted decision tree machine learning algorithm can stratify OTSCC patients and thus aid in their individual treatment planning.

Introduction

Oral tongue squamous cell carcinoma (OTSCC) refers to squamous cell carcinoma that arises from the anterior two thirds of the tongue (also known as mobile tongue). It is usually reported as part of oral squamous cell carcinoma (OSCC), which includes all anatomical subsites of the oral cavity. A recent international study including 22 registries reported 89,212 incident cases of OTSCC and an increasing annual incidence [1], which has been confirmed by others [2]. The primary treatment of choice for OTSCC is surgical excision. However, even early-stage tumors may express a pattern of aggressive behavior [3,4]. Thus, OTSCC with aggressive behavior and those with advanced stage require multimodality treatment including neck dissection and adjuvant (chemo)radiotherapy. Therefore, it is important to precisely estimate the clinical behavior and outcome of OTSCC. Predicting the risk of recurrences is one of the important assessments for the clinician during treatment planning. More importantly, early diagnosis and predicting the risk of recurrences form a milestone in the management of OTSCC as the recent analysis of Finnish cases reported that about 67% of OTSCC cases were diagnosed at an early stage (I-II) [5]. With accurate and timely recurrence prediction, high-risk cases of OTSCC can be identified and multimodality treatment applied accordingly. In a large cohort of early OTSCC, about one fourth of cases (27.8%) developed a recurrence, and all of them might have benefitted from early prediction and corresponding treatment planning [6].

Many recent studies have examined the use of machine learning (ML) techniques for prognostication of different cancers [7,8]. Interestingly, predicting patient outcome by ML techniques has shown better accuracy than Cox regression [9]. This is why the use of ML has been in active research focus during recent years. For instance, ML techniques have been used to predict the outcome of various cancer types [[10], [11], [12]] and a web-based tool based on artificial neural network to predict outcome in cancer has been reported [13].

In this study, we examined four different ML algorithms, namely, support vector machine (SVM), naive Bayes (NB), boosted decision tree (BDT), and decision forest (DF) in terms of their performances to predict locoregional recurrence in OTSCC patients. Also, the predictive performance of a permutation feature importance (PFI) of these algorithms was evaluated. Many researchers have used this approach for comparing ML techniques for survival prediction in different malignancies like breast and lung cancers [[14], [15], [16], [17]]. Tapak et al. examined six ML algorithms and two traditional methods for the prediction of breast cancer survival and metastasis [15]. In our study, we aimed to identify the best algorithm that would effectively classify patients as either low-risk or high-risk OTSCC recurrence. The algorithm with the overall best classification performance was further compared to a recently reported risk model based on the depth of invasion (DOI) [18]. This comparison was a result of the fact that DOI of 4 mm or deeper has been considered to be a factor that accurately predicts locoregional recurrence [6]. Moreover, the recent American Joint Committee on Cancer (AJCC) 8th edition incorporated depth of invasion (DOI) into T-stage [19]. Similarly, the study by Almangush et al. suggested that DOI is one of the strongest pathological predictors for locoregional recurrence [6]. This suggestion is in agreement with reports by others [20,21].

We hypothesize that the application of the above-mentioned supervised learning classifiers may be used in the prediction of OTSCC locoregional recurrences and will thereby add value for the management of OTSCC.

Section snippets

Patients

We used data from a study cohort comprising patients treated at the five Finnish University Hospitals of Helsinki, Oulu, Turku, Tampere, and Kuopio and at the A.C. Camargo Cancer Center, Sao Paulo, Brazil. This is a multicenter study from six institutions and data were provided for many cases as locoregional recurrences without specification. The clinicopathologic characteristics of this cohort have been previously reported and summarized [22]. The primary treatment for all cases was surgical

The training-validation phase for the algorithms in Microsoft Azure for prediction of recurrence

Microsoft Azure Machine Learning Studio (Azure ML 2019) was used in this study [27]. The data was preprocessed to handle missing values. The input parameters were age, gender, stage, grade, tumor budding, depth of invasion (DOI), worst pattern of invasion (WPOI), lymphocytic host response (LHR), perineural invasion (PNI) and treatment given, while the target output was locoregional recurrence. Disease-free survival (DFS) time of the cases ranged from 1 to 267 months. Specifically, the DFS in

Data description

The study cohort included 311 patients with cT1-T2cN0M0 OTSCC; 165 men and 146 women, resulting in a male-to-female range of 1.1:1. Out of these 311 cases, 57 cases had missing details about any postoperative treatment information. Therefore, these cases were excluded and the machine learning training was performed with 254 cases. These cases included 141 men and 113 women with the mean age at diagnosis was 61.51 (SD ± 14.81: range 10–95) and the median age was 62.0 years. The distribution

Discussion

The present study compared the performance of ML algorithms to stratify patients with OTSCC into low or high-recurrence risk group. In this regard, four ML algorithms, namely, boosted decision tree, naive Bayes, support vector machine, and decision forest were examined. We found that the performance of these techniques was higher than that of depth of invasion (DOI) based approach. Our multicenter cohort of cases is one of the largest published series. Majority of the previous publications

Authors contribution

Institutional Coordinators: Salo T, Coletta RD, Kowalski LP, Leivo I, Mäkitie AA, Haglund C. Study concepts and study design: Alabi RO, Elmusrati M, Almangush A, Coletta RD, Salo T, Leivo I. Data acquisition and quality control of data: Sawazaki‐Calone I, Kowalski LP, Leivo I. Data analysis and interpretation: Alabi RO, Elmusrati M, Almangush A, Sawazaki‐Calone I, Mäkitie AA, Salo T, Leivo I. Manuscript preparation: Alabi RO, Elmusrati M, Almangush A, Mäkitie AA, Coletta RD. Manuscript review:

Declaration of Competing Interest

The authors declare no conflicts of interest.

Acknowledgments

We would like to include the funding as follow: The School of Technology and Innovations, University of Vaasa Scholarship Fund. Turku University Hospital Research Fund, Helsinki University Hospital Research Fund, and the Finnish Cancer Society.

References (43)

  • I. Vázquez-Mahía et al.

    Predictors for tumor recurrence after primary definitive surgery for oral cancer

    J. Oral Maxillofac. Surg.

    (2012)
  • M.A. Ermer et al.

    Recurrence rate and shift in histopathological differentiation of oral squamous cell carcinoma – a long-term retrospective study over a period of 13.5 years

    J. Cranio-Maxillofacial Surg.

    (2015)
  • N.B. de Melo et al.

    Head and neck cancer, quality of life, and determinant factors: a novel approach using decision tree analysis

    Oral Surg. Oral Med. Oral Pathol. Oral Radiol.

    (2018)
  • B. Zhang et al.

    Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma

    Cancer Lett.

    (2017)
  • J.H. Ng et al.

    Changing epidemiology of oral squamous cell carcinoma of the tongue: a global study: changing epidemiology of tongue cancer

    Head Neck

    (2017)
  • K. Rusthoven et al.

    Poor prognosis in patients with stage I and II oral tongue squamous cell carcinoma

    Cancer

    (2008)
  • R. Mroueh et al.

    Improved outcomes with oral tongue squamous cell carcinoma in Finland: oral tongue carcinoma in Finland

    Head Neck

    (2017)
  • A. Almangush et al.

    For early-stage oral tongue cancer, depth of invasion and worst pattern of invasion are the strongest pathological predictors for locoregional recurrence and mortality

    Virchows Arch.

    (2015)
  • S. Anand et al.

    Analysis of SEER dataset for breast Cancer diagnosis using C4.5 classification algorithm

    Int. J. Adv. Res. Comput. Commun. Eng.

    (2012)
  • L. Zhu et al.

    Comparison between artificial neural network and Cox regression model in predicting the survival rate of gastric cancer patients

    Biomed. Rep.

    (2013)
  • D. Delen et al.

    Knowledge extraction from prostate cancer data

    Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS’06)

    (2006)
  • Cited by (0)

    1

    The last two authors have equal contributions.

    View full text