Review
Machine learning and deep learning methods that use omics data for metastasis prediction

https://doi.org/10.1016/j.csbj.2021.09.001Get rights and content
Under a Creative Commons license
open access

Abstract

Knowing metastasis is the primary cause of cancer-related deaths, incentivized research directed towards unraveling the complex cellular processes that drive the metastasis. Advancement in technology and specifically the advent of high-throughput sequencing provides knowledge of such processes. This knowledge led to the development of therapeutic and clinical applications, and is now being used to predict the onset of metastasis to improve diagnostics and disease therapies. In this regard, predicting metastasis onset has also been explored using artificial intelligence approaches that are machine learning, and more recently, deep learning-based. This review summarizes the different machine learning and deep learning-based metastasis prediction methods developed to date. We also detail the different types of molecular data used to build the models and the critical signatures derived from the different methods. We further highlight the challenges associated with using machine learning and deep learning methods, and provide suggestions to improve the predictive performance of such methods.

Keywords

Cancer
Metastasis
Machine learning
Deep learning
Artificial intelligence

Abbreviations

Acc
Accuracy
AE
autoencoder
ANN
Artificial Neural Network
AUC
area under the curve
BC
Betweenness centrality
BH
Benjamini-Hochberg
BioGRID
Biological General Repository for Interaction Datasets
CCP
compound covariate predictor
CEA
Carcinoembryonic antigen
CNN
convolution neural networks
CV
cross-validation
DBN
deep belief network
DDBN
discriminative deep belief network
DEGs
differentially expressed genes
DIP
Database of Interacting Proteins
DNN
Deep neural network
DT
Decision Tree
EMT
epithelial-mesenchymal transition
GA
Genetic Algorithm
GANs
generative adversarial networks
GEO
Gene Expression Omnibus
HCC
hepatocellular carcinoma
HPRD
Human Protein Reference Database
FC
fully connected
k-CV
k-fold cross validation
KNN
K-nearest neighbor
LIMMA
linear models for microarray data
LOOCV
Leave-one-out cross-validation
LR
Logistic Regression
L-SVM
linear SVM
MCCV
Monte Carlo cross-validation
MLP
multilayer perceptron
mRMR
minimum redundancy maximum relevance
NPV
negative predictive value
PCA
Principal component analysis
PPI
protein-protein interaction
PPV
positive predictive value
RC
ridge classifier
RF
Random Forest
RFE
recursive feature elimination
RMA
robust multi‐array average
RNN
recurrent neural networks
Se
sensitivity
SGD
stochastic gradient descent
SMOTE
synthetic minority over-sampling technique
Sp
specificity
SVM
Support Vector Machine
TCGA
The Cancer Genome Atlas

Cited by (0)

1

Shared first author.