Introduction

The novel coronavirus disease (COVID-19) spread rapidly throughout the world from Wuhan (Hubei, China) since December 2019 [2,3,4,5,6]. Since the outbreak, the number of reported cases has surpassed 12 million with more than 550 thousand deaths worldwide as of 12 July 2020 [7]. The COVID-19 disease is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is a member of the coronavirus family. On 11 March 2020, COVID-19 was declared as a pandemic by the World Health Organization (WHO) [8]. Due to the pandemic, hospital capacity is being exceeded in many places and face issues in terms of limited medical staff, personal protective equipment, life-support equipment, and others [9, 10]. Symptoms of COVID-19 are non-specific, and infected individuals may develop fever (83–99%), cough (59–82%), loss of appetite (40–84%), fatigue (44–70%), shortness of breath (31–40%), coughing up sputum (28–33%), or muscle aches (11–35%) [11]. The disease can further progress into a severe pneumonia, acute respiratory distress syndrome (ARDS), myocardial injury, sepsis, septic shock, and even death [12]. Though most COVID-19 patients have a mild illness, there are some patients who show rapid deterioration (particularly within 7–14 days) from the onset of symptoms into severe COVID-19 with or without ARDS [13, 14]. Current epidemiological data suggest that the mortality rate of patients with severe COVID-19 is higher than that of patients with non-severe COVID-19 [15, 16]. It has been reported that 26.1–32.0% of patient infected with COVID-19 are prone to progressing critical illness [17]. Recent studies have confirmed a high fatality rate of 61.5% for patients in critical cases, which increase with age and other medical comorbidities [17].

A large cohort study from 2449 patients has reported that during this pandemic, healthcare system can be overwhelmed by hospitalization (20–31%) and intensive care unit (ICU) admission rates (4.9–11.5%) [18]. This can be avoided by prioritizing hospital treatment for patients at high risk of deterioration and death, and treating low-risk patients in ambulatory environments, or by home-based self-quarantine. An effective tool is required to predict the disease trajectory to allocate resources efficiently and also improve the patient’s condition. Understanding the great potential of this approach, it is important to identify key patient variables that can help to predict the course of the disease at diagnosis. In other words, early identification of patients at high risk for progression to severe COVID-19 will help in efficient utilization of healthcare resources via patient prioritization to reduce the mortality rate.

Several researches indicate that biomarkers can help to classify COVID-19 patients with elevated risk of serious disease and mortality by providing crucial information regarding the patients’ health status. Al Youha et al. [19] proposed a prognostic model called the Kuwait Progression Indicator (KPI) score for predicting progression of severity in COVID-19. The KPI model was based on quantifiable laboratory readings unlike other self-reported symptoms and other subjective parameter-based scoring systems. The KPI score categorizes patients to low risk if the score goes below − 7 and high risk if the score goes above 16; however, the progression risk in the intermediate group (for patients scores within − 6 to 15) deemed by the authors as uncertain. This intermediate category however exists with many prognostic systems. Weng et al. [20] reported an early prediction score called ANDC to predict mortality risk for COVID-19 patients using 301 adult patients’ data. Least absolute shrinkage and selection operator (LASSO) regression has identified age, neutrophil-to-lymphocyte ratio (NLR), D-dimer, and C-reactive protein (CRP) recorded during admission as mortality predictors for COVID-19 patients [20]. They have developed a nomogram demonstrating good performance and also derived an integrated score, ANDC, with its corresponding death probability. They have also developed cutoff ANDC values to classify COVID-19 patients into three groups: low-, moderate-, and high-risk groups. The death probabilities were 5%, 5–50%, and more than 50% in the low-, moderate-, and high-risk group, respectively. Using a cohort of 444 patients, Xie et al. [21] proposed a prognostic model using lactate dehydrogenase, lymphocyte count, age, and SpO2 as key predictors of COVID-19-related death. The model showed good discrimination for internal and external validation with C-statistics of 0.89 and 0.98, respectively. Even though the model shows promising performance for internal calibration, however, external validation showed over- and under-prediction for low-risk and high-risk patients, respectively.

Yan et al. [1] reported a machine learning approach to select three biomarkers (lactic dehydrogenase (LDH), lymphocyte, and high-sensitivity C-reactive protein (hs-CRP)) and used them to predict individual patient’s mortality in on average 10 days ahead with more than 90% accuracy. In particular, high levels of LDH alone have been found to play a crucial role in identifying the vast majority of cases, which require immediate medical attention. However, there is no scoring system reported in this work, which can help the clinicians to identify the patients under risk quantitatively. Moreover, the claim of 10 days earlier prediction is an average matrix while some patients were admitted 32 days earlier than their actual outcome and some patient died in the same day of admission, and therefore, this average matrix will not reflect the performance of the model in individual level.

Another clinical study on 82 COVID-19 patients showed that respiratory, cardiac, hemorrhage, hepatic, and renal injury had caused the death of 100%, 89%, 80.5%, 78.0%, and 31.7% patients, respectively. Most of the patients had increased CRP (100%) and D-dimer (97.1%) [22]. The value of D-dimer as a prognostic factor was also shown to significantly increase odds of death if the amount is greater than 1 μg mL−1 upon admission [23, 24].

Although several predictive prognostic models are proposed for the early detection of individuals at high risk of COVID-19 mortality, a major gap remains in the design of state-of-the-art interpretable machine learning-based algorithms and high-performance quantitative scoring system to classify the most selective predictive biomarkers of patient death. Identifying and prioritizing those at severe risks is important for both resource planning and treatment therapy. Moreover, the high-risk patients should be possible to continuously monitored using a reliable scoring tool during their hospital stay-time. Likewise, reducing patient admission with very low risk of complications that can be handled safely by self-quarantine will help to minimize the load on healthcare facilities.

The aim of the study is to provide a simple, easy-to-use and reliable scoring system for the prognosis of risk severity of individuals suffering from COVID-19 to stratify them into appropriate risk group and provide them necessary health support accordingly. Therefore, using the state-of-the-art machine learning algorithm, an early warning system for mortality risk-prediction using a nomogram-based scoring system was developed. We have identified the discriminatory biomarkers that contributed the most in the classification of death and survival of COVID-19 patients among 76 biomarkers available in the dataset. The top ranked features and the best performing classification model was used to develop a multivariable logistic regression-based nomogram and validated for the prognosis of death and survival. It was also investigated that how useful this model in monitoring the death risk of the patients longitudinally.

Methodology

Human Subjects and Study Design

Blood samples collected between 10 January and 18 February 2020 from 375 patients in Wuhan, China, were retrospectively analyzed to identify reliable and relevant markers of mortality risk. Medical records were collected using standard case report forms, which included information on epidemiological, demographic, clinical, laboratory, and mortality outcomes. Yan et al. [1] has published the dataset along with the article and the original study was approved by the Tongji Hospital Ethics Committee.

Patients’ exclusion criteria for the study were: age (< 18 years), pregnant, breastfeeding, and missing data (> 20%). Out of 375 patients, 187 (49.9%) had fever while cough, fatigue, dyspnea, chest distress, and muscular soreness were present in 52 (13.9%), 14 (3.7%), 8 (2.1%), 7 (1.9%), and 2 (0.5%) patients, respectively.

Statistical Analysis

Stata/MP 13.0 software was used for conducting the statistical analysis. Gender variation was described using number and percentage. Continuous variables, age, and other biomarkers were reported with the number of missing data, median, mean, and quartiles (Q1, Q3) for each biomarker in death, and survival groups. Wilcoxon tests were conducted for all continuous variables while the chi-squared test was conducted for univariate analysis such as gender. Statistically significant difference was defined as a P value < 0.05. There were 76 biomarkers present in the original dataset; however, 14 biomarkers using two-different algorithms were identified as promising and are summarized in Table 1. These 14 biomarkers selected included lactate dehydrogenase (LDH), neutrophils (%), lymphocyte (%), high-sensitivity C-reactive protein (hs-CRP), serum sodium, eosinophil (%), serum chloride, monocyte (%), international normalized ratio (INR), activated partial thromboplastin time (APTT), high-sensitivity cardiac troponin I, brain natriuretic peptide precursor (NT-proBNP), albumin, and mean corpuscular hemoglobin concentration (MCHC).

Table 1 Statistical analysis of the characteristic of the subjects’ data

Of the 375 patients, 174 (46.4%) died, while 201 (53.6%) patients recovered from COVID-19 and were discharged from hospital. Figure 1 summarizes the outcome of patients based on their initial conditions: general (197), severe (27), and critical (151). The minimal, maximal, and median follow-up times (from hospital admission to death or discharge) for all 375 patients are 0 days, 35 days, and 12 days, respectively.

Fig. 1
figure 1

Patients’ outcome tree with the initial condition of the patients in admission

Table 1 summarizes the demographic characteristics, clinical characteristics, and clinical outcomes of the subjects in the death and survival groups. There were 142 (37.9%) patients, who were Wuhan residents, 2 (0.5%) had contact with confirmed or suspected patients, 24 (6.4%) were from familial cluster, 7 (1.9%) were health workers, 2 (0.5%) had contact with Huanan Seafood Market, and 198 (52.5%) had no contact history.

Two hundred twenty-four (59.7%) patients were male while 151 (40.3%) were female and the mean age of the patients was 58.83 years with a standard deviation of 16.46 years. Even though 76 demographic, laboratory, and clinical characteristics were available in the dataset, 14 biomarkers and two demographic variables were identified using feature ranking. Using two different feature ranking techniques, two different top-10 features were identified as most contributing features (Fig. 2). Some features are found common to both the techniques resulting in 15 different features contributing most for early prediction of death.

Fig. 2
figure 2

Comparison of the top-ranked 10 features identified using Multi-Tree XGBoost algorithm from data imputed using MICE (top) and (− 1) (bottom)

The detailed descriptions of 16 characteristics are listed in Table 1. It was found that gender, age, LDH, neutrophils (%), lymphocyte (%), hs-CRP, eosinophil (%), monocyte (%), INR, high-sensitivity cardiac troponin I, NT-proBNP, and albumin had statistically significant differences between the groups (P < 0.05), whereas serum sodium, serum chloride, APTT, and MCHC variables were not significantly different (P > 0.05) among the two groups. Out of these 16 characteristics, 12 characteristics were observed statistically significant. Therefore, it was important to check the most useful variables for the early prediction of death.

Feature Ranking

Even though multiple blood sample data of the patients were available, only the data from the first sample were used as inputs for model training and validation to identify the key predictors of the disease severity. The model also helps in distinguishing patients that require immediate medical assistance. Research using clinically captured data often suffers from missing data challenge leading to either bias introduction or negative impact on analytical outcomes. Simple approach to handle this challenge is deleting the respective rows of data from further analysis. It is stated in [25] that this simple approach of deleting rows with missing data can sometime lead to loss of valuable information that would have been beneficial in the analysis and also can lead to biased estimates. If the dataset is large, deleting the rows will have less impact on the missing variables; however, in this study, there are only 375 patients’ data available, and therefore, discarding rows of missing data will significantly affect the model performance.

There are many popular data imputation techniques to deal with the missing data. Multiple imputation using chained equations (MICE) data imputation technique is the most popular technique for clinical data imputation [26]. MICE technique uses multiple regression models to predict the missing data depending on the other available variables in the dataset. The technique also takes into account the datatype of the missing variables while imputing. Binary variables are predicted using logistic regression while continuous variables were predicted using predictive mean matching [25]. The imputed data is then normalized using “Z score” [27, 28]. Yan et al. [1] have released the database of 76 biomarkers from 375 infected patients of China and developed a prognostic model for the mortality risk of COVID-19 patients. In their study, missing data were padded by “− 1” as also commonly used by the researchers and normalized by Z score. In this study, the two different data imputation techniques are compared: imputation by − 1 and imputation using MICE technique. In both the cases, normalization was carried out using Z score technique.

Each of the 76 parameters of the dataset, after data imputation, was assessed to take decisions and identify the top-14 biomarkers in addition to age and gender, to obtain the top-ranked biomarkers as mortality predictors. Two different sets (top 10 features) were identified using Multi-Tree Extreme Gradient Boost (XGBoost) technique [29], according to their importance from the best imputation approach discussed above. The importance of each individual feature in XGBoost is from its accumulated use in each decision step in trees. The approach is extremely useful when dealing with clinical parameters [1, 30]. Initially, default settings of XGBoost was used, i.e., maximum depth = 4, learning rate = 0.2, tree estimators = 150, regularization parameter α = 1, and “subsample” and “colsample bytree” both set to 0.9 to avoid overfitting for cases with many features and limited sample size [1, 31].

Development and Validation of the Logistic Regression Model in Classifying the Outcome

Logistic regression is a model used commonly in medical statistics and is a statistical learning technique categorized in supervised’ machine learning (ML) methods dedicated to classification tasks [32]. It is really useful when we wish to estimate the probability of a bipartite outcome such as survival or death of a patient [33]. The logistic function is a sigmoid function and shrinks real value continuous inputs into a probability. They also make the independent values more resistant to deviations from normality and thus more consistent coefficients [32]. Therefore, this study uses a supervised logistic regression classifier [34] as the predictor model.

Receiver operating characteristic (ROC) curves for the test data were used to calculate area under the curve (AUC) for the predictor variables separately and also in combination. The AUC values helped in order to evaluate the performance of different top ranked features in classifying death and survival cases. After getting the top ranked features, the logistic regression classifier, discussed in the previous paragraph, was evaluated for different combinations of features as input to the model. The different combination models were validated using fivefold cross-validation (80% data were used for training and validation while remaining 20% data were used for testing and this is repeated 5 times). The performance of different models were evaluated using several performances metrics including sensitivity, specificity, positive likelihood ratio (PLR) and negative likelihood ratio (NLR) using testing dataset. Per-class values were computed over the overall confusion matrix that accumulates all test (unseen) fold results of the fivefold cross-validation.

$$Sensitivit{y}_{clas{s}_{i}}=\frac{T{P}_{clas{s}_{i}}}{T{P}_{clas{s}_{i}}+F{N}_{clas{s}_{i}}}$$
(1)
$$Specificit{y}_{class\_i}=\frac{T{N}_{class\_i}}{T{N}_{class\_i}+F{P}_{class\_i}}$$
(2)
$${PLR}_{class\_i}=\frac{Sensitivit{y}_{clas{s}_{i}}}{1-Specificit{y}_{class\_i}}$$
(3)
$${NLR}_{class\_i}=\frac{1- Sensitivit{y}_{clas{s}_{i}}}{Specificit{y}_{class\_i}}$$
(4)
$$\mathrm{where} \space clas{s}_{i}=\mathrm{suvival and death}.$$

In the above Eqs. , true positive (TP), true negative (TN), false positive (FP), and false negative (FN) were used to denote the number of died patients identified as died, the number of survivors identified as survivors, the number of survivors incorrectly identified as died patients, and the number of died patients incorrectly identified as survivors, respectively.

Development and Validation of Logistic Regression-Based Nomogram in the Outcome Prediction

A diagnosis nomogram was constructed using Stata/MP software version 13.0 and Nomolog (developed by Alexander Zlotnik) [35], based on multivariate logistic regression analysis. Logistic (logit) regression estimates the parameter in the form of a binary regression. Logistic regression works with probability, odds and regression. In the binary logistic model, there is an outcome/indicator variable which has two possible values. The outcome variable is a dependent variable which is typically labeled as “0” and “1” where “0” represent survival and “1” represents death in this case. The odds are the ratio of the probability (P) of an event happening to the probability of not happening, as shown in Eq. 5. Although the probability can vary between 0 and 1, the odds can vary between 0 and ∞. In logistic regression, the logarithm of odds is a linear combination of one or more independent variables (“predictors”) which can be a binary variable (e.g., gender) and continuous variable (e.g., age). The log-odds can be termed as linear prediction (LP), as shown in Eq. 6, and can be related to the probability of a particular outcome. Equations  are used to create relationship between death probability and the key predictors using logistic regression.

$$odds=\frac{P}{1-P}$$
(5)
$$LP=\mathrm{ln}\left(odds\right)=\mathrm{ln}\left(\frac{P}{1-P}\right)= {b}_{0}+{b}_{1}{x}_{1}+{b}_{2}{x}_{2}+\dots +{b}_{n}{x}_{n}$$
(6)
$$\frac{P}{1-P}= {e}^{{b}_{0}+{b}_{1}{x}_{1}+{b}_{2}{x}_{2}+\dots +{b}_{n}{x}_{n}}={e}^{LP}$$
(7)
$$P= \frac{{e}^{LP}}{1+{e}^{LP}}= \frac{1}{1+{e}^{-LP}}$$
(8)

The top-ranked features (independent variables) showing best AUC was used for creating the logistic regression-based nomogram. The entire dataset was divided into training (70%) and validation (30%) sets. Calibration curves for internal (with development set) and external (with validation set) validation were plotted to compare predicted and actual death probability of patients with COVID-19. Decision curve analysis (DCA) was carried out to identify the threshold values in which nomograms were clinically useful, using Stata software.

Development and Validation of Early Warning Score

The parameters were drawn as a numerated horizontal axis scale and the values for the patient are put on the numerated scale. A vertical line was drawn down from the different parameter numerated arranged scales downward to a score axis. All five scores on the score axis were added to make a total score, and this was linked to a death probability. It can be noted that according to the nomogram, higher score corresponds to a higher death probability. The model was designed using the initial blood sample of the patients. However, it can be applied to the biomarkers collected in later during the hospital stay period of the patients to predict death probability longitudinally using the LNLCA score.

Results

This section will discuss (a) the performance comparison of the two imputation techniques, (b) validation of the nomogram in predicting death using the best data imputation technique and the five top ranked features, and finally (c) evaluates the prognostic model.

Performance Comparison of Two Imputaiton Techniques

To identify the best data imputation technique and to determine the most contributing independent variables associated with death, data imputed with two different techniques were investigated with top-1, top-2, and up to top-10 features in each case of imputed data. It is clear from the Fig. 3 that top-ranked 5 features produced highest AUC of 0.97 for data imputed using MICE algorithm while top-ranked 3 features produced highest AUC of 0.95 for the data imputed using − 1 (Fig. 3). Table 2 shows the overall accuracies and weighted average performance for other matrices for different models using top 1 to 10 features for fivefold cross-validation using the logistic regression classifier along with the confusion matrices for each case.

Fig. 3
figure 3

Comparison of the receive operating characteristic (ROC) plots for top-ranked 1 up to 10 features using the data imputation using MICE (left) and (− 1) (right) while feature selection and classification techniques were same

Table 2 Comparison of the average performance matrix and confusion matrix from five-fold cross-validation for top1 to 10 features using data imputation using (-1) (A) and mice (B)

Top-ranked 5 features using MICE data imputation showed better performance than the top-ranked 4 features for the data imputed by (− 1). Therefore, in the rest of the study, 5 top-ranked MICE imputed independent variables—lactate dehydrogenase, neutrophils (%), lymphocyte (%), high-sensitivity C-reactive protein, and age (in short LNLCA)—were used for nomogram creation and scoring technique development and validation.

Evaluation of Nomogram in Predicting Death

A multivariate logistic regression-based nomogram for predicting early COVID-19 mortality was built using top-ranked five biomarkers that were found important both statistically and using ML-based classifier (as shown in Tables 1 and 2 and Fig. 3). The relationship between linear prediction of death and these biomarkers was evaluated using multivariable logistic regression with bootstrapping (1000 repetition) which is reported in Table 3. Regression coefficient, z value, standard error and its statistical significance along with 95% confidence interval are shown in Table 3. z Value is the ratio of regression coefficient and its standard error. Typically z value indicates the strong and weak contributors in logistic regression. The higher z values (either positive/negative) represent a strong contributor while values close to 0 represent a weak contributor. Therefore, out of 5 variables, neutrophil (%) is not a very strong predictor while age and lactate dehydrogenase are strong contributors. A null hypothesis of particular regression coefficient can determine the P value to relate the significance of a particular X variable in relationship to the Y variable. The X variables for which P is less than 0.05 have significant relationship to Y variables. This also reflects that the neutrophils (%) are weakly related to Y variable. However, the logistic regression classifier shows that 5 variables outperform than 4 variables. Therefore, no variable was discarded out of these 5 variables in developing the nomogram.

Table 3 The logistic regression analysis to construct the nomogram for death prediction

According to Fig. 4, the calibration plot graphed closely toward the diagonal line both for internal and external validation which were indicative of the reliable model. It is evident from Fig. 5 that the net benefit of every single predictor model is positive until threshold of 0.85. This indicates that all of them contributed to the prediction of outcomes. Interestingly, the full model demonstrated the best performance which also confirmed the need to combine five predictors in the model.

Fig. 4
figure 4

Calibration plot comparing predicted and actual death probability of patients with COVID-19. a Internal validation. b External validation

Fig. 5
figure 5

Decision curves analysis comparing different models to predict the death probability of patients with COVID-19. The net benefit balances the mortality risk and potential harm from unnecessary over-intervention for patients with COVID-19

As shown in Fig. 6, the nomogram is comprised of 8 rows while rows 1–5 are representing independent variables. For each variable, an assigned score was obtained by drawing a downward vertical line from the value on the variable axis to the “points” axis using COVID-19 patient data. The points of the five variables correspond to score (row 6), and the scores were added up to the total score, as shown in row 8. Then, a line could be drawn from the “Total Score” axis to the “Prob” axis (row 7) to determine the death probability of COVID-19 patients. However, it is useful to derive the mathematical equations explaining the total score, linear prediction, and death probability based on which the LNLCA score is calculated, using the similar equations derived earlier in Eqs. 58:

Fig. 6
figure 6

Multivariate logistic regression-based nomogram to predict the probability of death. Nomogram for prediction of death was created using the following five predictors: lactate dehydrogenase, neutrophils (%), lymphocytes (%), high-sensitivity C-reactive protein, and age

$$\mathrm{Linear prediction}=-3.662636+ 0.0735038\times \mathrm{age }\left(\mathrm{years}\right)+0.0110451\times \mathrm{hsCRP }\left(\frac{\mathrm{mg}}{\mathrm{L}}\right)-0.1624422\times \mathrm{lymphocyte}\left(\mathrm{\%}\right)-0.0327053\times \mathrm{neutrophils}\left(\mathrm{\%}\right)+ 0.0070514\times \mathrm{lactate dehydrogenase}(\frac{\mathrm{u}}{\mathrm{L}})$$
(9)
$$\mathrm{Death probability}=1/ (1+\mathrm{exp }(-\mathrm{linear prediction}))$$
(10)

The corresponding probability of death for a given LNLCA score was determined from the model and is listed in Table 4. In particular, LNLCA score cutoff values of 10.4 and 12.65 correspond to 5% and 50% of death probability (based on [20]); thus, these values can be used to stratify COVID-19 patients into three groups: low-, moderate-, and high-risk groups. The death probability were less than 5%, between 5 and 50%, and more than 50% for low-risk group (LNLCA < 10.4), moderate-risk group (10.4 ≤ LNLCA ≤ 12.65), and high-risk group (LNLCA > 12.65), respectively.

Table 4 LNLCA score from nomogram and corresponding death probability of COVID-19 patients

Performance Evaluation of the Model

Figure 7 shows an example nomogram-based scoring system for a COVID-19 patient with the variable values at admission. Individual score for each predictor was calculated and added to produce total score, and death probability was calculated to 80%. This can be done as early as 9 days before the death of the patient.

Fig. 7
figure 7

An example nomogram-based score to predict the probability of death of a COVID-19 patient from test set (9 days before the actual outcome)

Furthermore, we have categorized the patients from training and testing subgroups into three subgroups (low, moderate, and high risk) by associating actual outcome with the predicted outcome using the LNLCA score. For training set (Table 5), the proportions of death were 0% (0/83) for low-risk group, 22.6% (12/53) for moderate-risk group, and 88.1% (111/126) for high-risk group while for test set (Table 6), the proportions of death were 0% (0/41) for low-risk group, 22.7% (5/22) for moderate-risk group, and 94% (3/50) for high-risk group. It was found that the true death rates were significantly different (P < 0.001) among the three subgroups. Therefore, this nomogram-based scoring technique can be used to early predict patients’ outcome to categorize them into low-, moderate-, and high-risk groups as shown in Table 4 and prioritize the moderate- and high-risk group patients.

Table 5 Association between different risk groups and actual outcome in the training cohort using Fisher exact probability test
Table 6 Association between different risk groups and actual outcome in the testing cohort using Fisher exact probability test

There were 52 patients in the test set who had an outcome of death after different duration of hospital stay. Some patients were hospitalized in very late stages while some other patients were admitted in the early stages. The minimum, maximum, mean (± standard deviation), and median of hospital admission to death for the test data set were 3.68, 760.92, 249.2 ± 227.55, and 172.79 h, respectively. Most patients out of the 375 patients of the cohort had multiple blood samples taken throughout their hospital stay. LNLCA model-based prediction score was calculated on the admission and also calculated for the next available samples and identified when the model is predicting the patient in high-risk group in the earliest possible time after admission. Figure 8 shows the difference in hours between hospital admissions to the event of death and also shows when the model can predict the potential outcome with 100% accuracy. It was evident from Fig. 8 that the model can predict the outcome of 52 patients within several hours after admission for most of the patients. The minimum, maximum, mean (± standard deviation), and median of model’s high-risk prediction to death for the test data set were 3.68, 756.11, 239.85 ± 228.56, and 156.36 h, respectively. The model can even predict 31.5 days in advance for a patient about the outcome with a probability of 97%. This early prediction suggests that, where a patient’s condition deteriorates, the clinical route is able to give an early warning to clinicians several days in advance.

Fig. 8
figure 8

Estimation of the prediction of the patients’ outcome for 52 test patients with death outcome. The model was trained on the data present at admission, and multiple samples from a patient were used to predict the patient to be in high-risk group in the earliest time after admission. Note: “0” denotes the death outcome event for each patient, and vertical lines represent the time of admission with respect to death. Solid red line starts from the earliest prediction time point of death prediction, and the dotted line represents the delay between admission and death prediction by the model using the LNLCA model

Discussion

Yan et al. [1] have ranked three biomarkers (lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP)) and used them to predict individual patients’ mortality using machine learning model. However, there is no scoring system reported in this work, which can help the clinicians to identify the patients under risk quantitatively. Current study investigated the relationship between the disease severity and the different clinical biomarkers and developed a scoring system to stratify the severity of the patients so that clinicians can identify the patients under risk easily.

Ten predictors were identified by Multi-Tree XGBoost algorithm as death probability predictors based on the data acquired at hospital admission time. Two different prediction models were compared while missing data were imputed using − 1 and using MICE algorithm. Ten different classification models were trained, validated, and tested for top 1 to 10 features using two different techniques. It was observed from the AUC and performance matrices that the MICE-based technique outperforms other approach with an AUC = 0.97 achieved for 5 top-ranked features. Then, a logistic regression-based nomogram was developed using these five variables. An integrated score (LNLCA) with corresponding likelihood of death was obtained for the early stratification of COVID-19 patients based on the severity prediction. This can help to effectively the use the healthcare facilities without overloading their capability.

Age was identified as a key predictor of mortality in previous studies on coronavirus family such as SARS [36], Middle East respiratory syndrome (MERS) [37], and COVID-19 [38]. This study has also concluded similar findings, and this is because with the older age the immunosenescence and/or multiple medical conditions tend to make patients more prone to critical COVID-19 illness [20].

Yan et al. [17] showed that in patients with severe pulmonary interstitial disease, there is a significant increase of LDH and can be associated with indications for lung injury or idiopathic pulmonary fibrosis [39]. Consistent results from the previous research were also found in this study, in which critically ill patients with COVID-19 had elevated levels of LDH, suggesting an increase in activity and severity of lung injury. LDH is an intracellular enzyme that leaks from damaged cells due to infection and viral replication leading to elevated levels in circulation.

Recently, Liu et al. [40] proposed that increased neutrophil-to-lymphocyte ratio (NLR) can aid in the early prediction of the severity of COVID-19 illness. Both neutrophils and lymphocytes are critical components of the immune system and play very important role in host defense and clearing infections. Lymphopenia, medical condition due to lower number of lymphocytes in the blood, is a typical feature in COVID-19 patients and may be a key factor in disease severity and mortality [41]. In this study, we have used neutrophil and lymphocyte percentage and similar to the previous studies have found that lower percentage of these two quantities were associated with severe COVID-19 patients. According to previous research, patients with community-acquired pneumonia have significant immune system activation and/or immune dysfunction leading to changes in these quantities [41]. In addition, on the event of immunosuppression and apoptosis of lymphocytes caused by specific anti-inflammatory cytokines, bone marrow circulates neutrophils [42], resulting in an increased NLR. However, in contrast to other models, it was observed in this study, both the parameters were small for high-risk patients.

Lu et al. [43] stated that CRP tested upon admission may assist in predicting confirmed or suspected short-term mortality associated with COVID-19. CRP is an acute phase protein formed by hepatocytes caused by leukocyte-derived cytokines induced by infection, inflammation, or tissue damage [44,45,46]. Similar findings were found in this study where increased CRP rates were measured at admission for the high mortality risk COVID-19 patients. This indicated that these patients developed a serious lung inflammation or possibly a secondary bacterial infection, and clinical antibiotic treatment might be appropriate for those patients [1].

Non-survivors in our study had low lymphocyte and neutrophil percentages, higher age, hs-CRP, and LDH than those of survivors. In addition to the dysregulation of the coagulation system and immune system, it can be seen that COVID-19 severity was significantly linked to the inflammatory response to the infection. This could lead to other worse medical consequences like ARDS, septic shock, and coagulopathy. Therefore, this kind of prognostic model will aid in the development of a rational and personalized therapeutic plan for the patients with critical illness.

Weng et al. [20] recently suggested that age, NLR, D-dimer, and CRP were individual key predictors correlated with death probability. These key predictors were used to create a nomogram for death prediction due to COVID-19. In our research, the five key predictors recorded at admission were chosen by the XGBoost feature selection to create a nomogram-based prognostic model that exhibits excellent calibration and discrimination in predicting death probability of COVID-19 patients. It was also validated by an unseen validation cohort. Moreover, it was verified with multiple blood sample data collected from the patients during their hospital stay and the model holds valid for those cases as well. The AUC values for development and validation cohort showed a strong distinction of 0.961 and 0.991, respectively, using the proposed nomogram, which is, to the best of our knowledge, outperforms any other nomogram-based models for COVID-19 mortality prediction. In addition, this nomogram-derived LNLCA score offered a simple, easy-to-understand, and interpretable early detection tool for stratifying the high-risk COVID-19 patients at admission and thereby assist their clinical management. COVID-19 patients were categorized into three risk groups with varying risk of death using LNLCA score measured and calculated at admission. Low-risk group cases could be isolated and treated in an isolation center while the moderate-risk patients could be treated isolation ward in a specialized hospital. On the other hand, patients in high-risk group could be under close monitoring and should be moved to critical medical services or ICU for urgent treatment if required.

This study has scope for further improvement, which will be carried out in the future work. Firstly, the study motivates the possibility of research on COVID-19 clinical data helping in early mortality prediction but the proposed machine learning method is purely data-driven and may vary if starting from different datasets. The model can be further improved with the help of a larger dataset. Secondly, the modelling principle adopted here is to have a minimal number of features for accurate predictions to avoid overfitting, which can be revised with several other models to identify any other sets of best features on a multi-center and multi-country data to produce a generalized model.

Conclusion

In summary, based on multiple risk factors (lactate dehydrogenase, neutrophils (%), lymphocytes (%), high-sensitivity C-reactive protein, and age), our developed nomogram can predict the prognosis of patients with COVID-19 with good discrimination and calibration. The model can predict the patient’s outcome far ahead of the day of primary clinical outcome with very high accuracy. Therefore, the application of LNLCA would help clinicians make an efficient and optimized patient stratification management plan without overloading the healthcare resources and also reduce the death with improved and planned response. A Web-based application is created using the model which can be used by the clinicians to take benefit of the model (https://www.openasapp.net/portal#!/client/app/2309220f-105e-4e7d-9fe0-51cbef5bb18c). The authors also plan to further improve the performance of the model with the help of larger dataset with multicenter and multicountry data.