Introduction

Outpatient no-shows, who failed to attend scheduled appointments, in healthcare systems remain problematic [1, 2]. Patients’ no-shows typically result in increased healthcare costs, underutilized medical resources and affect patient care [1, 3, 4]. Clearly projected no-show increases is a key area for containing health care costs and improve system efficiency [3]. Healthcare organizations must consider the probability of patient no-show when scheduling appointments [3]. The performance of traditional strategic such as overbooking may not be consistently high, because it strongly depends on the no-show moments. In contrast, the use of machine learning to predict a no-show probability will guide decision for more reliable appointment scheduling strategies [5]. Predicting the patients who are most likely to miss their appointment can guide the facility towards better direction and care. It is worthwhile noting that this no-show case has been traditionally analyzed using historical data. The prediction techniques in the other fields such as economics already have a foundation, scientific findings and long history. However, these techniques are unusual within the healthcare, especially when restricted to the public domain [6].

A good starting point to achieve this goal is to explore factors that affect no-show rate based on information available, both for patients and appointments. Anticipated knowledge of patients’ behavior is important, so that care clinics can react accordingly [6]. Since databases are large, they can exceed one million appointments for all Ministry of National Guard Health Affairs (MNGHA) facilities. The emergence of machine learning techniques along with big data analytics play a crucial role here has made it possible to carry out this study. Machine learning is an application of Artificial intelligence has been used widely by the research community to turn a variety, heterogeneous, huge data sources into high quality knowledge. Therefore, maximizing efficiency and discover cost-effective opportunities which consider as major pillar by healthcare providers [6, 7]. In addition to providing premier capabilities to discover pattern discovery or identify risk factors. However, applying machine learning techniques on complex big data is computationally expensive, it requires a massive computing power in terms of file space, memory, and CPU. A platform for big data analysis is becoming important as the data amount grows. Apache Spark MLlib is a platforms for big data analysis which offers a library for different machine learning techniques. In this contribution, we highlight big data machine learning from the computational perspective [7].

To handle increasing demand and recompense patient no-shows appointments, this paper provides a framework using big data to explore factors that influence outpatients’ no-show and develop predictive models. We explore the power of using Big Data Machine Learning to accomplish this task.

Related work

Several articles from other studies focusing on the various aspects of no-show in hospitals and documenting the effort to reduce no-show rate. Blumenthal et al. study aimed to develop a model to predict no-show for a scheduled colonoscopy. The predictive model used natural language processing (NLP) using historical medical records and endoscopy scheduling system. The model achieved AUC = 70.2 and 33% and 92% for sensitivity and specificity respectively [8]. Kurasawa et al. used logistic regression to predict missed appointments for diabetes patients. The value of AUC for the best predictor was 0.958; precision, recall and F-measure were, respectively, 0.757, 0.659 and 0.704 [9]. Devasahay et al. used historical appointment data merged with distance variable to predict no-show patients. They run Logistic Regression (LR), Support Vector Machine (SVM) and Recursive Partitioning to come up with predictive models. The best model was decision tree with 23.22% sensitivity and PPV of 15.58% (cut off.15). They were not be able to predict the type of patients will miss appointments accurately [10]. Goffman used logistic regression to model demographic and appointment characteristics, and the history of patient’s behavior. The model accurately identified no-show patients with average AUC = 0.71 [11]. Harvey et al. used logistic regression to determine whether the patient successfully attend the appointment in the radiology department. The model considered 16 associated factors with AUC of 0.75. Further analysis was conducted based on different modalities; the predictive ability of the models were 0.74, 0.78 for C and MAMMO respectively, and 0.75 for both MRI and ultrasound [12]. Elvira et al. proposed a new model that used Gradient Boosting (GB) algorithm for predicting no-show probability. A value of 0.74 for the Area under the curve (AUC) was the best results [6]. Srinivas and Ravindran proposed framework to develop no-show prediction models then proposed a scheduling rules using healthcare data from various sources. Among five different machine-learning algorithms used, stacking was the best with AUC = 0.846. Further, they integrated the no-show risk obtained from stacking model to the scheduling rules, this leads to improve the operational performance compared to the traditional overbooking approach [13]. Mohammadi et al. proposed three machine-learning models to predict no-show of next medical appointment. The overall accuracy of naïve Bayes was the highest, the model achieved 82%. The AUC for logistic regression, naïve Bayes and Multilayer perceptron are, 0.81, 0.86 and 0.66, respectively [14]. Dantas et al. developed a predication model using logistic regression with an accuracy of 71%. The purpose of this model was to explore the factors related to no-show rates. They found that factors significantly associated with no-show in a bariatric surgery clinic were specialty, lead-time, the hour and month of the appointment, previous appointment and no-show history, type of appointment and distance [15]. Nelson et al. 2019 proposed predictive models for imaging appointments. They used four different algorithms, which are logistic regression, support vector machines, random forests, AdaBoost. The Gradient Boosting models achieved the best performance with AUC of 0.852 and precision of 0.511 [16]. AlMuhaideb et al. applied JRip and Hoeffding algorithms on historical outpatient scheduling data to build predictive models. The predictive ability of both JRip and Hoeffding models were 76.44% and 77.13%, respectively, with area under the curve for JRip at 0.776 and for Hoeffding tree at 0.861 [17]. Ahmadi et al. addresses the problem of no-shows and late cancellations for neurology appointment through two-stages. First, they identifies important features using three algorithms, which are Decision Tree (DT), Random Forest (RF), and Naïve Bayes (NB). Second, the selected features from first stage considered for training the stacking model. Random Forest performs better than Naïve Bayes and Decision Tree in both stages. NSGA-II3 approaches achieved the highest AUC = 0.697 and lower number of features [18].

On the other hand, deep learning methods have attracted many researchers and organizations in health care field. Deep learning methods are useful with problems, which are difficult to solve with traditional methods. They are provide the optimal way to deal with high dimensional and volume data. Furthermore, present a whole picture embedded in large-scale data and disclose unknown structure. It has proven to be superior prediction of no-show thus effective optimizing of the health resource usage. There is very little effort in using deep learning in the prediction of patient’s no-show. We have only found one study using deep learning to predict no-show patients in outpatients’ clinics. Dashtban and Li 2019 represented a novel prediction method for outpatients non-attendance based on wide range of health, environment and social economics factors. The model is based on deep neural networks, which have integrated data reconstruction and prediction steps from in-hospital data. This integration aiming to have higher performance than separated classification model in predicting tasks. The result of compare proposed model with other machine learning classifiers showed deep learning model outperforms other methods in practice. The model achieved (AUC (0.71), recall (0.78), accuracy (0.69)). Finally, the constructed model was deployed and connected to a reminder system [19].

Method

Data for this study were extracted from Ministry of National Guard Health Affairs (MNGHA) data warehouse, a large institutional database derived from Electronic Medical Record (EMR). A total of (2,011,813‬) data were queried for all outpatients visits scheduled from 2018 to July 2019 in the central region. All cancelled visits were excluded from the present study. The no-show factors have categorized in two groups. The first one involves appointments characteristics, as the appointment time, lead-time and distance. Second factors related to patients themselves, age, gender and the history of previous appointments. We also added calculated variables that will allow us to add information, such as the number of previous appointments, the number of no-show appointments, lead-time (number of days between reservation data and the appointment). The final group of attributes consisted of 20 attributes that selected and calculated based on knowledge and previous works.

Data set are then pre-processed, eliminating incomplete and incorrect records, dealing with missing values and solving inconsistencies. Transformation between categorical or numerical data types was performed by means of normalization or scaling. In normalization, rescaling the attribute value from the original range to keep the values range between [0, 1]. In discretization, the age numerical attribute is transformed into a categorical attribute by selecting five as a cutoff point. Then we applied VectorAssembler function that transform all columns, both raw and calculated, into a single vector column can be passed to the ML algorithm [20]. Furthermore, we identified factors that have the greatest importance on the prediction and significantly influence the performance of the model. Information gain method used to rank factors based on their impact on the show and no-show of patients and remove irrelevant factors [21].

As part of our work, we run an experimental evaluation of Apache Spark and MLlib under python programming languages using PySpark [22]. This study involves five machine learning techniques for predictive data task. That includes Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), Support Vector Machine (SVM) and Multilayer Perceptron (MLP). Random Forests (RF) developed by Breiman is related to the methodology of decision-tree classification algorithms. It produce many individual decision trees (forest of trees) by selecting one input attribute randomly at each iteration and learning whether the classification results is more or less. For classification tasks, gini impurity and information gain are the most common metrics for defining the “best split”. At next iteration(s), the attribute either removed or included depending on the results of the previous iteration(s). Finally, the results from different models are combined to make the prediction. [23,24,25] Gradient Boosting (GB) method was introduced by Leo Breiman and has been used in regression and classification. It is an ensemble of a number of weak decision trees prediction models to become a stronger learners. The prediction model resulted from GBM builds up in a stage-wise manner by adding new weak learners using a gradient descent to minimize the loss of the model. In boosting, a new learner is fit a subsample of the training dataset where selected randomly without replacement of full data set, then compute the model update for the current stage [26,27,28]. Logistic regression (LR) was first used by Quetelet and Verhulst to describe the growth rate of populations [29]. It is one of predictive analysis methods that used to model the probability of binary target. Logistic regression can also be used for multi-label prediction; the features can also be made to be non-linear as well. It use a linear combination of the different types of inputs and passes through the logistic function. Making predictions using logistic regression is easy to implement and provides a good results [30]. Support vector machine (SVM) algorithm proposed by Cortes and Vapnik in 1995, it is capable of constructing an optimal hyperplane to separate data points into classes based on a priori features of the training dataset. There are various hyperplanes or kernel functions that could be chosen in order to maximum distance between data points of both classes, so that future data points can be classified more accurately. The main advantages of SVM are the effectiveness in an N-dimensional space and it is memory efficient because it partition data into training points called support vectors used in the decision function [25, 31, 32]. A Multilayer Perceptron (MLP) is an artificial network of neurons called Perceptrons. The perceptron computes a single output through nonlinear activation function from linear combination of multiple weighted inputs. Each Perceptron combined with many other perceptions and forms a fully connected network with input, output and hidden layers in between [33]. Cybenko and Funahashi have verified that single hidden layer networks are adequate to approximate continuous function to achieve certain accuracy [34].

To evaluate models, two main methods used the hold out and the tenfold cross-validation. For the holdout method, we have been using two data splits in the ratio of 70:30 and 80:20. For the tenfold cross validation method, the dataset splitting into 10 partitions. One of the partitions used for testing and the others partitions used for training. Then the average of different metrics calculated to return the result. By averaging the 10 partitions, any variance or bias will be lower than single holdout method [35]. Matrices that used to select the best model are Accuracy, Precision, Recall, F-measure and Area Under the Curve, and F-measure. In addition to as well as the training and evaluation time, each metric is defined as follows:

  • Accuracy: number of visits correctly classified.

  • Precision: number of visits correctly classified by the system divided by number of all visits correctly classified by the system.

    $$\Pr ecision\;\text{ = }\frac{{{\text{TruePositive }}\left( {\text{TP}} \right)}}{{{\text{TruePositive }}\left( {\text{TP}} \right) + {\text{FalsePositive }}\left( {\text{FP}} \right)}}.$$
    (1)
  • Recall: number of visits correctly classified by the system divided by number of positive visits in the testing set.

    $$\text{Re} call\text{ = }\;\frac{{{\text{TruePositive }}\left( {\text{TP}} \right)}}{{{\text{TruePositive }}\left( {\text{TP}} \right) + {\text{FalseNegative }}\left( {\text{FN}} \right)}}.$$
    (2)
  • F- measure: measure Recall and Precision at the same time, it represents the balance between both.

    $$F\text{ - }Score\;\text{ = }\;\frac{{2 * {\text{Precision * Recall }}}}{{{\text{Precision }} + {\text{Recall}}}}.$$
    (3)
  • ROC: measure classification performance at various thresholds settings by show how much model is capable of classify visits. It considers the tradeoffs in precision and recall [36].

  • Time: training and evaluation time of the algorithms.

Results

A total of (2,011,813‬) visits (mean age of 6.38   ±  4.35 years of which 61.34% were female) were included. There were (1,474,391) no-show (537,422‬) show visits, the overall proportion of no-shows at all outpatients’ clinics was (26.71%). Of these visits, we will not consider cancelled appointments. The study had average of lead-time of 19.58 days. Each record contains 20 variables, which summarized in Table 1. As per Table 1, male patients were less likely to miss their appointments than female patients. New patients were the most likely to miss of their appointments. The patients who has Follow up were the second most likely to miss their appointments. The age distribution of outpatients shows in Fig. 1.

Table 1 Descriptive characteristics of the dataset (N = 2,011,813)
Fig. 1
figure 1

Age distribution of outpatients

As an outcome of the feature importance process, the top four predictors are; number of no-show appointments, medical department, lead-time and number of show appointments. The second four important predictors group are appointment type, patient type, outpatient clinics and appointment month. While appointment year, distance, gender, reservation type and nationality are not important predictors, thus removed from the models. The rest factors have less influence on the no-show such as number of schedule appointments, number of walk-in appointments, appointment time and age. The factors related to patients have more impact on no-show of patients to than factors related to the appointments. Ranking of factors in the predictive model is performed according to the calculated of Info Gain. The list of the factors ranked base on their importance in Fig. 2, the prediction models developed using only 14 factors.

Fig. 2
figure 2

Feature importance ranking of factors in the developed machine learning models

We have evaluated the different models using different validation methods and various evaluation metrics. In general, performance for all models among evaluation metrics were close except time. Tables 2, 3 and 4 describes the experiments results carried out to show the performance of Spark using five machine learning algorithms over the same huge dataset. We evaluated the effectiveness of all classifiers in terms of time to train and evaluate the models, accuracy, precision, recall, F-measure and ROC. MLP and RF classified visits well. From the results, we can see that the percentage of all metrics is comparable for both classifier. A more improvement observed for the MLP in F-measure than RF, LG and SVM have similar ROC performance, LG are preferred than SVM as it produces better performance in all metrics with less computation power. SVM likely performs poorly due to the limitation of kernel function in MLlib, the only available linear kernel is used with SVM algorithm. GB performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively.

Table 2 Evaluation metrics shown by different models on predicting outpatients no-show using 70/30 holdout method
Table 3 Evaluation metrics shown by different models on predicting outpatients no-show using 80/20 holdout method
Table 4 Evaluation metrics shown by different models on predicting outpatients no-show using tenfold cross validation

To better understand efficiency, Fig. 3 presents the ROC curve of five models to illustrate the precision of each classifier. Five models achieved identical ROC using different validation methods. From the plot, we can easily show that Gradient Boosting is best model (area = 081). SVM with linear kernel and Logistic Regression returned comparable classification results. Currently, MLlib supports linear SVMs only; using non-linear kernels may outperform Logistic Regression.

Fig. 3
figure 3

ROC of the developed machine learning models

As evaluation criteria, we have employed the overall training and test time (in seconds) for all five algorithms as shown in Tables 4 and 5. Since the performance is close for all metrics, time is the key factor for selecting the best validation method. Unlike other metrics, there are a differences between times of the algorithms and considered a huge difference in the training time. GB achieved best performance using 70:30 holdout method significantly outperformed all other methods in training time-value metric. For 70:30 holdout method, we observe that GB is around 15 × times slower than MLP, although it achieved the optimal results. SVM, the algorithm with close performance to LG, takes about 68x times as long to train the model. Logistic Regression is 4x times faster than the next two accurate algorithm MLP and RF with comparable performance. For huge datasets, the time is a factor to select one of the quicker algorithms, considering that the time values of models depends on the choice of algorithms parameters. We showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ (Table 6).

Table 5 Training time value for each machine learning model (seconds)
Table 6 Test time value for each machine learning model (seconds)

Model deployment

The age of big data in healthcare is here, and these are truly revolutionary times to move from standard regression-based methods to more future-oriented like predictive analytics, machine learning, and graph analytics. The target is to lead the way to support data-driven predictive tools and catch up with other industries. The developed predictive model adopted in practice as a pilot phase led by the Information System and Informatics Division (ISID) in MNGHA. The implementation of No-show model reflects the prediction results in a meaningful way to support decision-making process. Figure 4 shows a screenshot of the dashboard for monitoring the model performance and accuracy by data scientist. The dashboard summarized and visualized the whole information of no-show cases in three main sections. The first section is for accessing the actual data trend comparing with predicted data. The second and third section is for descriptive modeling result that present actual no-show based on department and services. In Fig. 5 the weekly prediction dashboard presents number of patients being predicted as no-show per week. This will enable timely action to control no-show rate, thus reduce operating costs and waste. The use of a predictive tool to improve the clinic outcomes is achievable.

Fig. 4
figure 4

Dashboard for monitoring the model performance and accuracy by data scientist

Fig. 5
figure 5

Dashboard for weekly prediction of no-show cases

Discussion

In this study, we attempt to identify the key factors to predict patients who will not attend the appointment (no shows) using regular available hospital data. The literature about predicting no-show has showed that logistic regression analysis was the main technique that used to identify factors influence no-show behavior. To the best of our knowledge, none of the existing work focused on time value of model as factor of evaluating model in the area of no-show big data analytics. Moreover, there are limited publications about the predication of no-show behavior using big data machine-learning approach. Spark not widely used for this type of dataset in predicting no-show of outpatients specifically on Saudi Health Data. This study designed to analyze a unique and rich dataset consisting of (2,011,813) visits, collected from patient EHR data, to explore factors that used to formulate prediction using big data Machine learning techniques. Applying big data technology is a remarkable field with a bright future, can bring several potential impacts and innovations, if approached correctly. Accordingly, this work has provided the organizations with a case of a big data tool, analytic method, and technology, which can be applied. This provides vast horizons of opportunities of more advancement solutions for big data analytics that support decision making. Therefore, future research can focus on providing a big data framework, which can encompass the challenges in dealing with big data [37].

Compared to other studies, such as the ones by Elvira et al. (2018) and Nelson et al. (2019), our performance are comparable: Anderer reports AUC performance of 0.74, Elvira reports 0.85, both authors achieved the highest performance using Gradient Boosting algorithm. This study contributes to existing literature by focusing on time value in evaluating the models in terms of training and evaluation time. The need for processing and analyzing big data effectively is crucial for organizations aiming for a leading role in healthcare field. So, in order modeling of big data and overcome many difficulties faced by traditional methods. In this respect, this study has introduced a model-driven method to determine the algorithm that will operate at the maximum level with big data and can scale to massive data. In comparison to Dashtban and Li (2019) study, we showed a clearer gain using GB in AUC (0.81) and accuracy (0.79) while deep neural networks have been reported AUC (0.71), accuracy (0.69). Our framework can be extended both theoretically and practically as future work by applying deep learning approach to our dataset.

There have been several studies focusing on the reason the patients’ no-show. The main factor of no-show with the reasons such as mistakes and misunderstandings is the forgetfulness [38]. Other important factors for no-show were booking difficulty, work commitment, distance and seeking care in another healthcare facility [39]. Transportation is a key factor in addition to environmental factors that affect patient attendance and have value in predicting no-show. Factors including weather, distance, socioeconomic status and number of show in previous appointments [40]. These results confirm previous findings by Dantas et al. (2019) that lead-time and number of previous show/no-show are important factors in appointment attendance. The impact of increased time between the scheduled date and the appointment date was observed in increasing no-show rate. The results of this study recommend that reducing no-show rates among outpatients might be addressed by reviewing lead-time specially it one of factors that controls by clinics. As we have demonstrated, some medical departments experienced high risk of no-show such as diabetic department, which gives appointments for insulin injection daily. Knowing factors associated with no-show can help improve quality of care and attempt to control factors that can be changed to reduce the no-show rate. This would have a direct impact on healthcare care in practical and financial way [39].

Learning from previous studies, is clear that different interventions have a high success rate in reducing the negative impact of no-show [41]. A study conducted by Goffman indicated a reduction of no-show rate from 35% to 12.16%. The predicted no-show patients received a reminder call before 24, 48, and 72 h of their appointments [11]. Arora et al. evaluated the effective of automated text message as reminder system to increase show rate of follow-up appointment for patients after discharged from the emergency department. They found that the intervention was effective and reduced the overall appointment attendance rate from 72.6% to 70.2% [42]. Cancellation policies is one of intervention strategies to reduce patient no-shows and important for service operations. This could be used by clinics for rescheduling appointments. The findings indicate that when fill rates are low and no-show probabilities are high, the time required patients to cancel appointments needs to increase in order to achieve the goal of being cost-effective [43]. A number of healthcare systems implemented SMS text messages as a reminder, which shows promise as an instant, simple, cost-effective means communications with protecting patient privacy. However, sending SMS to all patients, who have scheduled appointments, is not free. Using a prediction system will limit the sending of SMS to predicted show patients only. This would mean a cost reduction without affecting of attendance ratios [44, 45].

A real-world implementation of the model validated our findings and assessed the efficiency of the scheduling policy on patients’ no-show behavior over time. One area of consideration in the implementation of model is the patient’s history, it is essential to update that. A reasonable way is to automate the calculation of important features for patients who have appointment next week to update the history. Another area of consideration is how to handle cases of new patients. All new patients be assigned to zero missed appointments, until patient’s behaviour change otherwise. The most important question from an implementation standpoint is how to react when a patient is predicted as no-show. This decision is eventually up to the facility, the MNGHA fully intends to use this machine-learning model in production, provide a proactive responding, recommendation, and determine a number of interventions, to reduce no-shows rate [2]. Advanced, considered real-time predictive analytics is still an open question for future researches. Moreover, there are various other factors can be explored and utilized for predicting no-show. More improve seems to be plenty of room by attempting to add more features e.g. medication refill, lab appointments, or special clinic orders. Further studies are required investigating the extent the economic consequences of patient no-show and explored the factors that may modulate no-show rates. Finally, The Spark cluster is setup using one node, further analysis is recommended by using multiple nodes.

Conclusion

In this study, the innovative topic of big data analytics have been shown to provide prediction capabilities in healthcare. Gain valuable insights from such unique and rich data to support decision making were examined. Such value can be provided using machine learning techniques, which has recently gained lots of interest and express a great significance in this era of health data. The contribution of this paper is to explore the factors related to the risk of no-show, to stratifying the patients in outpatient clinics with respect to this risk. Moreover, present an evaluation of five machine learning techniques using Spark platform on predicting the patients’ no-show. Determining the associated risks and predicting no-show is a challenging undertaking. This model can be used to improve clinics’ resource utilization and improve care access.