Skip to main content

A novel method detecting the key clinic factors of portal vein system thrombosis of splenectomy & cardia devascularization patients for cirrhosis & portal hypertension

Abstract

Background

Portal vein system thrombosis (PVST) is potentially fatal for patients if the diagnosis is not timely or the treatment is not proper. There hasn’t been any available technique to detect clinic risk factors to predict PVST after splenectomy in cirrhotic patients. The aim of this study is to detect the clinic risk factors of PVST for splenectomy and cardia devascularization patients for liver cirrhosis and portal hypertension, and build an efficient predictive model to PVST via the detected risk factors, by introducing the machine learning method. We collected 92 clinic indexes of splenectomy plus cardia devascularization patients for cirrhosis and portal hypertension, and proposed a novel algorithm named as RFA-PVST (Risk Factor Analysis for PVST) to detect clinic risk indexes of PVST, then built a SVM (support vector machine) predictive model via the detected risk factors. The accuracy, sensitivity, specificity, precision, F-measure, FPR (false positive rate), FNR (false negative rate), FDR (false discovery rate), AUC (area under ROC curve) and MCC (Matthews correlation coefficient) were adopted to value the predictive power of the detected risk factors. The proposed RFA-PVST algorithm was compared to mRMR, SVM-RFE, Relief, S-weight and LLEScore. The statistic test was done to verify the significance of our RFA-PVST.

Results

Anticoagulant therapy and antiplatelet aggregation therapy are the top-2 risk clinic factors to PVST, followed by D-D (D dimer), CHOL (Cholesterol) and Ca (calcium). The SVM (support vector machine) model built on the clinic indexes including anticoagulant therapy, antiplatelet aggregation therapy, RBC (Red blood cell), D-D, CHOL, Ca, TT (thrombin time) and Weight factors has got pretty good predictive capability to PVST. It has got the highest PVST predictive accuracy of 0.89, and the best sensitivity, specificity, precision, F-measure, FNR, FPR, FDR and MCC of 1, 0.75, 0.85, 0.92, 0, 0.25, 0.15 and 0.8 respectively, and the comparable good AUC value of 0.84. The statistic test results demonstrate that there is a strong significant difference between our RFA-PVST and the compared algorithms, including mRMR, SVM-RFE, Relief, S-weight and LLEScore, that is to say, the risk indicators detected by our RFA-PVST are statistically significant.

Conclusions

The proposed novel RFA-PVST algorithm can detect the clinic risk factors of PVST effectively and easily. Its most contribution is that it can display all the clinic factors in a 2-dimensional space with independence and discernibility as y-axis and x-axis, respectively. Those clinic indexes in top-right corner of the 2-dimensional space are detected automatically as risk indicators. The predictive SVM model is powerful with the detected clinic risk factors of PVST. Our study can help medical doctors to make proper treatments or early diagnoses to PVST patients. This study brings the new idea to the study of clinic treatment for other diseases as well.

Background

Portal vein system thrombosis (PVST) refers to the blockage or narrowing of the portal vein, splenic and superior mesenteric veins, or intrahepatic portal vein branches, by a thrombus [1]. It is relatively rare and its clinical manifestations range from asymptomatic to severe complications including fever, abdominal pain, nausea, vomiting, and ileus [2]. The formation of PVST could increase the risk of upper gastrointestinal bleeding, hepatic coma or even fatal intestinal necrosis [3]. Moreover, PVST imposes difficulty on further liver transplantation [4, 5]. With the development of imageological examination, more and more studies have shown that the incidence of PVST after splenectomy is significantly higher than previously reported. The reported incidence of PVST after splenectomy is different greatly, ranging from 0.36% [6] to even 80% [7]. Why are there so much inconsistence in the incidence of post-splenectomy PVST? It comes from the difference in examination methods, types of study, time and frequency of postoperative examinations, and the underlying diseases, etc. [8]. Up to now, the specific mechanisms leading to the formation of PVST after splenectomy are not known. It is generally agreed that hemodynamic changes of the portal venous system [9,10,11], blood hypercoagulability [3], cecum induced by splenic vein ligation [12], local inflammatory reaction [13], and irrational use of coagulants [14] are all important factors affecting the occurrence of PVST. Some studies also demonstrated that the formation of PVST was related to the volume of spleen, diameter of portal vein, prothrombin time (PT), plasma D-dimer level, and the function and quality of platelet rather than the count of platelet [15,16,17,18]. So far, it has been controversial in the role of early prophylactic anticoagulation in preventing PVST. This is because of concerning the risk of inducing bleeding, especially in the cirrhotic patients [19,20,21]. However, in the last decade some studies demonstrated that both pro- and anticoagulation elements were concomitantly reduced in liver cirrhosis patients [22, 23], and the occurrence of bleeding for these patients was mainly due to the severity of portal pressure, endothelial dysfunction and bacterial infections, but not the disturbed hemostasis [24]. These studies provide the fundamental science for the prophylactic application of anticoagulation in these patients. Although the study to PVST has attracted many researchers [25,26,27,28,29,30] and some of them have found that prophylactic anticoagulation therapy can effectively prevent PVST after splenectomy even to cirrhotic patients [31], there are not any standard regimen for PSVT prophylaxis having been developed, and furthermore there are not any researchers focusing on detecting risk factors of PVST after splenectomy in cirrhotic patients by introducing machine learning to this field. Therefore we devote ourselves to this field.

We first propose a novel feature selection algorithm named RFA-PVST (Risk Factor Analysis for PVST) to detect the clinic risk factors of PVST, then we introduce the typical learning machine SVM to build the predictive model to PVST. We collect the clinic data of 92 splenectomy and cardia devascularization patients for cirrhosis and portal hypertension from the highest level hospital in PR China.

In our RFA-PVST, we propose the definition of discernibility and independence for each index to imply the capability of it in telling a PVST patient from non-PVST patients, and the differences of an index to other indices, respectively. The detected clinic indexes are with much higher discernibility and independence. The SVM model built on the detected risk factors can effectively tell PVST patients from non-PVST patients, and help medicine doctors to make proper cure decisions or early diagnoses to potential PVST patients. 5-fold cross validation experimental results on the aforementioned 92 clinic patients, and the statistic test between RFA-PVST and available famous feature selection algorithms demonstrate that the clinic risk factors detected by our RFA-PVST are statistically significant on which a very powerful predictive model is built.

Results

This section will display the clinic risk factors of PVST detected by our proposed RFA-PVST, and the power of these risk factors in recognizing PVST patients by the performance of the SVM model based on them in terms of its accuracy shorted as Acc in the following of this paper, sensitivity, specificity, precision, F-measure, FPR (false positive rate), FNR (false negative rate), FDR (false discovery rate), AUC (area under ROC curve) and MCC (Matthews correlation coefficient). The performance comparison are shown between our RFA-PVST and the available feature selection algorithms including mRMR [32], SVM-RFE [33], Relief [34], S-weight [35] and LLEScore [36]. The statistic test results between our RFA-PVST and the aforementioned feature selection algorithms are also presented.

Clinic risk factors of PVST

Figure 1 displays the collection of all clinic indexes in circles in the 2-dimension space with discernibility as x-axis and independence as y-axis. The red circle indicates clinic risk factors, meaning the area of the rectangle enclosed by coordinate lines and axes is much bigger than the rest ones. Table 1 lists clinic indexes in descending order by their risk degrees in 5-fold cross validation experiments. The underlined bold font means the detected risk factors, corresponding to the red circle depicting clinic indexes in Fig. 1. Table 2 displays the performance of 5 different SVM models of 5-fold cross validation experiments on the test subsets in terms of Acc, AUC, sensitivity, specificity, precision, F-measure, FNR, FPR, FDR, and MCC. Table 3 displays the average results of 5-fold cross validation experiments in terms of same metrics as that in Table 2 under same conditions. The underlined bold fonts in Tables 2 and 3 mean the best results.

Fig. 1
figure 1

Scatter plots of clinic indexes of 5-fold cross validation experiments

Table 1 the clinic indexes ranked in descending order in their risk degrees of 5-flod cross validation experiments
Table 2 Performance of PVST predictive models on different sets of risk indicators of 5-fold cross validation experiments
Table 3 Experimental results of algorithms of 5-fold cross validation experiments

Statistic test results of RFA-PVST

Friedman’s test with α = 0.05 of our proposed RFA-PVST and mRMR, SVM-RFE, Relief, S-weight and LLEScore are displayed in Table 4 in terms of Acc, AUC, sensitivity, specificity, and precision of the SVM predictive models of PVST with the same number of risk indexes detected by each algorithm, respectively.

Table 4 The Friedman’s test results with α = 0.05 of our RFA-PVST and mRMR, SVM-RFE, Relief, S-weight and LLEScore

The multiple comparison test between each pair of algorithms at the confidence level of 0.95 is displayed in Table 5 in terms of Acc, AUC, sensitivity, specificity, and precision. The upper triangle of each test shows the mean rank difference between algorithms, and the lower triangle the statistical significance between each pair of algorithms, where * is the tag of strong significance between corresponding algorithms in the corresponding metrics.

Table 5 Paired rank comparison of algorithms in Acc, AUC, sensitivity, specificity, and precision of predictive model built on clinic risk indicators to PVST detected by algorithms

Discussion

This section will discuss all of the experimental results displayed in the section of results.

Clinic risk factor discussion

The results in Fig. 1 disclose that our proposed metric RD is useful in detecting the clinic indexes with higher risk degree. The red circle clinic indexes in Fig. 1 comprise risk clinic indicators of PVST, and can be detected by our RFA-PVST automatically. The results in Fig. 1 reveal that the risk clinic factors for each fold experiment are variant for the variance of exemplars in each training subset of 5-fold cross validation experiments. However the number of risk factors of 5-fold cross validation experiments is from 2 to 8 with average 5. The common clinic indexes are anticoagulant therapy (with ID 32) and antiplatelet aggregation therapy (with ID 33) among 5 risk clinic indicator subsets detected by our proposed RFA-PVST. The clinic indexes of CHOL with ID 7, Ca with ID 17 and D-D with ID 31 appear 3 times among 5 subsets. This fact implies that anticoagulant therapy and antiplatelet aggregation therapy are the first two important risk indicators to predict PVST patients followed by the comparable important clinic indicators of CHOL, Ca and D-D.

The results in Table 1 disclose that antiplatelet aggregation therapy (with ID of 33) is the riskiest clinic index to PVST, followed by anticoagulant therapy (with ID of 32). The WBC (with ID of 20) and INR (with ID of 27) are the clinic indexes with the least risk degree causing PVST. In addition, the results in Table 1 tell us that although the training samples are variant, the first two clinic risk factors are same in each fold of 5-fold cross validation experiment, which further indicate that our proposed RFA-PVST algorithm is powerful in finding the risk clinic factor of PVST.

The results in Table 2 tell us that the performance of different PVST predictive models on test exemplars are variant in terms of Acc, AUC, sensitivity, specificity, precision, F-measure, FNR, FPR, FDR and MCC. The predictive model has got the highest AUC value of 0.91 with only one clinic index of whether antiplatelet aggregation therapy is treated or not, and the best specificity of 0.75 and the best FPR of 0.25 as well. The predictive model built on the 8 clinic indexes including anticoagulant therapy, antiplatelet aggregation therapy, RBC, D-D, CHOL, Ca, TT and weight, has got the highest PVST predictive accuracy of 0.89, and the best sensitivity, specificity, precision, F-measure, FNR, FPR, FDR and MCC of 1, 0.75, 0.85, 0.92, 0, 0.25, 0.15 and 0.8 respectively. Although its AUC is not the best one among 5-fold cross validation experiments, it has got the comparable good AUC value of 0.84. Therefore we can conclude that these 8 clinic indexes are important clinic indicators on which the sound prediction model can be built to predict whether PVST will take place or not for splenectomy with cardia devascularization patients for liver cirrhosis and portal hypertension.

The results in Table 3 tell us that our RFA-PVST can detect risk clinic indicators with which a SVM classifier can be built with best mean predictive accuracy, AUC, specificity, precision, FPR and FDR. Although this predictive model can only recognize 70% PVST patients in terms of sensitivity, not as good as that by SVM-RFE and Relief which can detect all PVST patients, our predictive model can detect 45% non-PVST patients while SVM-RFE and Relief cannot detect any one. This fact means that the predictive models by SVM-RFE and Relief exist the fatal error of recognizing all non-PVST patients as PVST ones, while the SVM classifier based on the risk indicators detected by our proposed RFA-PVST can make excellent tradeoff between sensitivity and specificity.

Statistic test result discussion

It can be seen from the results in Table 4 that p < 0.05 holds for all metrics used to do statistic test, including Acc, AUC, Sensitivity, Specificity, and Precision. So we can conclude that the strong significant difference exist between our RFA-PVST and the compared algorithms, including mRMR, SVM-RFE, Relief, S-weight and LLEScore, that is the risk indicators detected by our RFA-PVST are statistically significant.

The multiple comparison test results in Table 5 in terms of accuracy (Acc), AUC, sensitivity, specificity, and precision of predictive models of PVST based on the risk indicators detected by the related algorithms reveal that our RFA-PVST can detect the risk clinic factors with much better predictive power to PVST for splenectomy plus cardia devascularization patients for liver cirrhosis and portal hypertension, compared to mRMR, SVM-RFE, Relief, S-weight and LLEScore. The results disclose the fact that our RFA-PVST is powerful in detecting the clinic risk indexes to predict whether PVST will happen or not on splenectomy plus cardia devascularization patients for liver cirrhosis and portal hypertension.

Conclusions

A novel algorithm named RFA-PVST is proposed to detect the clinic risk indicators of PVST for splenectomy and cardia devascularization patients for liver cirrhosis and portal hypertension. The discernibility and independence are defined for each clinic index. All of the clinic indexes are scatted in a 2-dimensional space with independence and discernibility as y-axis and x-axis, respectively. Those clinic indexes in top-right corner of the 2-dimensional space are detected automatically as risk indicators. The SVM classifier is built on the detected risk indicators to predict whether the PVST will happen or not on a splenectomy plus cardiac devascularization patient for liver cirrhosis and portal hypertension.

5-flod cross validation experiments on the clinic data of 92 patients disclose that antiplatelet aggregation therapy is the riskiest clinic index, followed by anticoagulant therapy. Taking the two therapies may lead to PVST for splenectomy plus cardiac devascularization patients for liver cirrhosis and portal hypertension. CHOL, Ca, and D-D are also important risk factors. Anticoagulant therapy, antiplatelet aggregation therapy, RBC, D-D, CHOL, Ca, TT, and weight comprise the clinic risk indicators to PVST. The predictive model based on these 8 risk indicators is very powerful.

Furthermore, the comparison between our proposed RFA-PVST and available typical feature selection algorithms including mRMR, SVM-RFE, Relief, S-weight and LLEScore demonstrate that our RFA-PVST is very powerful to detect the risk clinic indicators to recognize PVST from non-PVST patients. The significant test between the aforementioned algorithms reveal that there is strong significant difference between our RFA-PVST and the famous available feature selection algorithms. In addition, it is fantastic that our study results are coincident with that from references [17, 37] about D-D is a clinic risk indicator of PVST.

We can conclude that our study is significant in the field of detecting risk factors causing PVST for splenectomy and cardia devascularization patients for liver cirrhosis and portal hypertension. It can help medical doctors to make proper treatments or early diagnoses to PVST patients. This study also provides a new idea to the clinic treatment of other diseases.

Methods

This section will first introduce the data used in this paper, then the preprocessing method will be introduced for the data. It should be noted that we are authorized to use the data under the condition of deleting the privacy information of patients. Then the SVM learning machine will be briefly introduced. After that we will introduce the idea of our proposed novel algorithm RFA-PVST in detail, and the methods building a SVM classifier in the clinic risk factors detected by our RFA-PVST. Finally the statistical test method will be introduced to value the significant difference between our RFA-PVST and other classic methods.

Data used in this paper

This subsection will cover the data information and the data preprocessing methods used in this paper.

Raw data

We collected clinic data of 92 patients of splenectomy with cardia devascularization for liver cirrhosis and portal hypertension from one of the first level hospital in PR China. The patients are partitioned into two groups, one is composed of 52 patients with PVST, and the other is of 40 patients without PVST. The PVST group comprises 30 male and 22 female patients, and the ages of these patients are from 20 to 71 with average age and standard deviation of 47 ± 10. The non-PVST group is composed of 22 male and 18 female patients with ages from 27 to 77, and the average age with standard deviation is 47.9 ± 10.8. The descriptions of the data can be found in Table 6.

Table 6 Data information

The causes of the cirrhosis and portal hypertension and the distributions for these 92 patients are here.

  • 59 patients from HBV (Hepatitis B virus) cirrhosis, with 64.13% ratio.

  • 8 patients of HCV (Hepatitis C Virus) cirrhosis, about 8.70% ratio.

  • 7 patients for autoimmune cirrhosis with 7.61% ratio.

  • 4 idiopathic cirrhosis patients with the ratio of 4.35%.

  • 2 alcohol type cirrhosis patients with 2.17% ratio.

  • 2 patients from idiopathic hypersplenism cirrhosis with the ratio of 2.17%.

  • 1 splenic infarction cirrhosis patient with the ratio of 1.09%.

  • 1 budd-chiari syndrome patient with the ratio of 1.09%.

  • 1 gaucher disease patient with the ratio of 1.09%.

  • 1 patient for both HBV + HCV with the ratio of 1.09%.

  • 1 virus untyped cirrhosis patient with the ratio of 1.09%.

  • 1 hypoferric anemia cirrhosis patient with the ratio of 1.09%.

  • 1 patients for idiopathic thrombocytopenic purpura, and with 1.09%.

  • 1 patient for primary hypersplenism with the ratio of 1.09%.

  • 1 patient from portal cavernous transformation with 1.09% ratio.

  • 1 patient for liver cirrhosis with 1.09%.

The clinic indexes of these 92 patients are listed in Table 7. There are 33 clinic indexes, including 6 countable clinic indicators such as age, gender, weight, bleeding volume, anticoagulant therapy, antiplatelet aggregation therapy, and the other 27 measurable indexes. The measuring clinic indexes are recorded daily or every other day after operations and the date was also recorded at the same time. There are two therapy for patients were adopted to prevent PVST after operations including anticoagulant therapy and antiplatelet aggregation therapy. The anticoagulant therapy comprises giving patients low molecular heparin calcium by hypodermic injection in 4100 IU/qd or 5000 IU/qd only, or combined with warfarin orally together. The antiplatelet aggregation therapy includes taking aspirins orally in 0.1~0.3 g/qd only, or together with dipyridamole in 25 mg/tid or 50 mg/tid.

Table 7 Clinic indexes of splenectomy with cardia devascularization for cirrhotic and portal hypertension patients

Data preprocessing

The age and the bleeding volume indexes use the original record value. Gender, anticoagulant therapy, and antiplatelet aggregation therapy are treated as Boolean variables, where male is 0 and female is 1, and without anticoagulant therapy is expressed as 0 and 1 otherwise, and without antiplatelet aggregation therapy is 0 and 1 otherwise. The median of measurable values is taken as the value for that measurable clinic indexes. If PVST occurred then the label for the patient is 1, which belongs to positive class, otherwise the label is − 1, belonging to negative class.

To avoid the influence on experimental results from variant measurement metrics for different clinic indexes, we successively normalize and discretize data in (1) and (2).

$$ {x}_{i,j}=\frac{x_{i,j}-\mathit{\min}\left({\mathbf{x}}_j\right)}{\mathit{\max}\left({\mathbf{x}}_j\right)-\mathit{\min}\left({\mathbf{x}}_j\right)} $$
(1)

where xi, j is the specific value of the jth index for the ith patient, and max(xj), and min (xj) are the maximum and minimum value of the jth index, respectively.

$$ {d}_{i,j}=\Big\{{\displaystyle \begin{array}{ll}-1& {x}_{i,j}<{\mu}_j-{\sigma}_j\\ {}\ 1& {x}_{i,j}>{\mu}_j+{\sigma}_j\\ {}\ 0& \kern1.5em else\end{array}} $$
(2)

where μi is the mean value of index j (1 ≤ j ≤ 33), and its standard deviation is σi, then the discretized value for the index is di, j in (2).

Support vector machines

SVM is a typical learning machine coined by Vapnik in 1920s [38]. It is based on the VC (Vapnik-Chervonenkis) dimension and the structure risk minimization with sound theoretic basics and concise mathematic model. It is a learning machine for small exemplars, and has got best generalization by making the optimal trade-off between the model complexity and the learning ability. SVM has been widely used in biomedical filed, and has greatly influenced the diagnosis and predictions of diseases [39,40,41,42]. The characteristic of SVM is that it maps the samples in low dimensional input space into high-dimensional feature space via kernel functions, so that the inseparable exemplars in low dimensional input space has become separable in high-dimensional feature space by an optimal hyperplane.

The popular used kernel functions are here.

  • linear kernel functions: K(x, x') = x ⋅ x'.

  • polynomial kernel function: K(x, x') = (x ⋅ x '  + 1)d, d is positive integers.

  • radial basis kernel function: K(x, x') = exp(−‖x − x'‖2/σ2), σ is positive real.

RFA-PVST algorithm

Feature selection is to detect several features from original ones to construct the feature subset making a specific criterion optimized [43]. The nature of feature selection is to display samples in a low dimensional space by those selected several features while preserving the pattern of samples as that in its original high dimensional space as much as possible [43]. It is usually implemented by erasing redundant and less important features while preserving the important ones. The selected features not only can preserve the classification power of original system, but also can reduce the complexity of classification model while improving its generalization [43,44,45]. The selected features preserve their physical properties with good interpretability, such that feature selection study has been paid much more attention by experts from statistics and machine learning fields, and has been widely applied to disease diagnoses [39,40,41,42]. The selected features do help medicine doctors to make proper decisions and take proper diagnoses to related patients.

We propose RFA-PVST algorithm to detect clinic risk factors of PVST for splenectomy and cardia devascularization patients for cirrhosis and portal hypertension, so as to build the predictive model for PVST via the detected risk factors. The 92 post-splenectomy and cardia devascularization patients comprise exemplars for liver cirrhosis and portal hypertension, and their clinic indexes as features. The detecting clinic risk indexes is in fact a feature selection procedure.

We define the discernibility and independence for each clinic index, and plot the curve of independence with discernibility for all clinic indexes in a 2-dimensional space with discernibility and independence as x-axis and y-axis, respectively. All clinic indexes in top-right corner of the 2-dimensional space comprise risk factors for they are with both comparatively high discernibility and high independence, while the less risk ones lie in bottom-left corner. To quantify how much contributions of a clinic index to telling a PVST patient form non-PVST patients, we define the risk degree for each clinic index as the product of its discernibility and its independence, that is, the area of the rectangle enclosed by coordinate lines and axes in the 2-dimensional space. Consequently the clinic indexes with much higher risk degree than the rest ones are detected out and the SVM classifier is built based on the risk factors to predict whether the splenectomy and cardia devascularization patients for liver cirrhosis and portal hypertension are PVST patients or not.

Let training dataset D = {x1, x2, ⋯, xn} ∈ Rm × n, where m is the number of patients and n the number of clinic indexes. We define disj, indj, and RDj to express the discernibility, independence, and risk degree for the clinic index j(1 ≤ j ≤ n), respectively in (3)–(7).

Definition 1

Discernibility: Let N0 and N1 be the number of patients with and without PVST, respectively, and S(j) be the statistics of Wilcoxon signed rank test for clinic index j, xi, j is the value of sample i in its clinic index j, then the discernibility disj of clinic index j is defined in (3), and S(j) is calculated in (4).

$$ di{s}_j=\mathit{\max}\left\{{N}_0\ast {N}_1-S(j),\kern1em S(j)\right\} $$
(3)
$$ S(j)=\sum \limits_{k=1}^{N_0}\sum \limits_{i=1}^{N_1}\chi \left(\left({x}_{i,j}-{x}_{k,j}\right)\le 0\right) $$
(4)

where \( \chi \left(\cdotp \right)=\Big\{{\displaystyle \begin{array}{l}1,\kern1em \left({x}_{i,j}-{x}_{k,j}\right)\le 0\\ {}0,\kern1em \mathrm{otherwise}\end{array}} \).

From the Definition 1, we can see that disj of clinic index j can express its discernibility between patients with PVST and without PVST very well, so it can be used to value whether the clinic index j is a risk factor or not of causing PVST for splenectomy and cardia devascularization patients for liver cirrhosis and portal hypertension.

Definition 2

Independence: The independence indj of clinic index j is defined in (5), where xj and xk are vectors of clinic index j and k. It is a negative exponential function of the correlation coefficient pr between clinic index j and its most correlated clinic index k with higher discernibility. For the clinic index j with the highest discernibility to PVST, its independence is defined as the negative exponential function of the correlation coefficient pr between j and its least correlated clinic index k. This correlation coefficient pr can be any kind of parameters to express the correlation between two variables. We adopt Pearson coefficient in our study. In order to unify the positive or negative correlation between clinic indexes, we adopt the absolute of Pearson coefficient expressed in (6), where X,Y are vectors of any two clinic indexes, and \( \overline{\mathbf{X}} \) is the mean vector of X, \( \overline{\mathbf{Y}} \) the mean vector of Y.

$$ in{d}_j=\Big\{{\displaystyle \begin{array}{l}{\max}_k\left(\exp \left(- pr\left({\mathbf{x}}_j,{\mathbf{x}}_k\right)\right)\right),\kern1.12em di{s}_j=\max \left\{ di{s}_i|i=1,\cdots, n\right\}\\ {}\underset{k: di{s}_k\succ di{s}_j}{\min}\left(\exp \left(- pr\left({\mathbf{x}}_j,{\mathbf{x}}_k\right)\right)\right),\kern1em otherwise\end{array}} $$
(5)
$$ pr\left(\mathbf{X},\mathbf{Y}\right)\overline{=}\frac{\mid {\left(\mathbf{X}-\overline{\mathbf{X}}\right)}^T\left(\mathbf{Y}-\overline{\mathbf{Y}}\right)\mid }{\sqrt{{\left\Vert \mathbf{X}-\overline{\mathbf{X}}\right\Vert}^2{\left\Vert \mathbf{Y}-\overline{\mathbf{Y}}\right\Vert}^2}} $$
(6)

The above independence definition disclose that the less correlation of a clinic index with other indexes, the stronger is its independence, and vice versa. This definition is coincident with the principles in nature. In addition, the definition in (5) guarantees that the clinic index with the highest discernibility for PVST definitely has got the independence as high as possible, which further guarantees that it will be definitely selected as risk factors of PVST.

Definition 3

Risk Degree (RD): The risk degree of clinic index j is defined as the product of its discernibility and independence in (7), which is the area of the rectangle enclosed by its coordinate lines and axes, where the discernibility is the x-coordinate and independence the y-coordinate.

$$ R{D}_j= di{s}_j\times in{d}_j $$
(7)

The main steps of the proposed RFA-PVST are described as follows.

Input: Training dataset D ∈ Rm × n, m is the number of patients, n is the number of clinic indexes, Y is the label vector indicating PVST patients or not.

Output: Set S of risk factors.

BEGIN let S = ∅, F = {all clinic factors}; FOR j = 1 to n DO BEGIN calculate disj for clinic index j in eq. (3); calculate indj for clinic index j in eq. (5); calculate RDj for clinic index j in equation in (8); END //of FOR Plot all clinic indexes in the 2-dimensional space with discernibility as x-axis and independence as y-axis; Select clinic indexes in top-right corner to comprise set S of risk factors; END

Constructing predictive models

5-cross validation experiments are conducted, and SVM learning machines with RBF (Radial Basis Function) kernel functions are adopted. The proposed RFA-PVST is used to detect risk factors of PVST. The SVM classifier is constructed based on the detected risk factors. The performance of this SVM classifier is compared to that based on the indices by available feature selection algorithms to evaluate the power of RFA-PVST in detecting factors to recognize PVST patients.

Selecting parameters for SVM

The kernel function and its parameters are very important for a SVM learning machine [46]. We take RBF kernel function and grid search technique to find the optimal penalty parameter C and kernel function parameter γ for SVM. The grid search technique is to first set the specific range for C and γ, respectively, then test each pair of (C, γ) on training subset by cross validation experiments to find the best pair of (C, γ). Finally, the pair (C, γ) with the highest cross validation accuracy is the best pair parameters to be selected.

Building SVM model for predicting PVST

5-fold cross validation experiments are done on our collected clinic data of splenectomy plus cardia devascularization for liver cirrhosis and portal hypertension. The patients with PVST and without PVST are partitioned into 5 balanced parts respectively, so as to get 5 subsets of exemplars for 5-fold cross validation experiments. The RFA-PVST algorithm is conducted on training subset to get risk factors to construct set S. Then we construct the new training subset TSnew whose exemplars only embodying risk factors from set S. The best pair of parameters (C, γ) is found on TSnew. Finally the SVM classifier is built based on the best pair of parameters (C, γ) and the new training subset TSnew to predict PVST.

Evaluation methods

The power of our proposed RFA-PVST is evaluated in two aspects. First, it is evaluated by the performance of the SVM classifier built on the selected risk indexes by proposed RFA-PVST. Second, it is evaluated by the significant statistic test between the SVM classifiers built on the risk indexes by RFA-PVST and by other popular feature selection algorithms.

Model evaluation

The performance of the SVM classifier is tested by exemplars in test subset in terms of predictive accuracy shorted as Acc, sensitivity, specificity, precision, F-measure, FPR (False positive rate), FNR (False negative rate), FDR (False discovery rate), AUC(Area under an ROC curve) and MCC(Matthews correlation coefficient). ROC is the acronym of receiver operating characteristic curve, which is a very famous metric to evaluate a model. AUC is the quantity value of ROC [47, 48]. These metrics are defined in eqs. (8)–(17) based on the confusion matrix in Table 8. The power of our RFA-PVST is compared to the available feature selection algorithms including mRMR [32], SVM-RFE [33], Relief [34], S-weight [35] and LLEScore [36].

Table 8 Confusion matrix
$$ Acc=\frac{TP+ TN}{TP+ FP+ FN+ TN} $$
(8)
$$ sensitivity=\frac{TP}{TP+ FN} $$
(9)
$$ specificity=\frac{TN}{FP+ TN} $$
(10)
$$ precision=\frac{TP}{TP+ FP} $$
(11)
$$ F- measure=\frac{2 precision\ast sensitivity}{precision+ sensitivity}=\frac{2 TP}{2 TP+ FP+ FN} $$
(12)
$$ FPR=\frac{FP}{FP+ TN}=1- specificity $$
(13)
$$ FNR=\frac{FN}{TP+ FN}=1- sensitivity $$
(14)
$$ FDR=\frac{FP}{TP+ FP}=1- precision $$
(15)
$$ MCC=\frac{TP\ast TN- FP\ast FN}{\sqrt{\left( TP+ FN\right)\ast \left( FP+ TN\right)\ast \left( TP+ FP\right)\ast \left( FN+ TN\right)}} $$
(16)
$$ AUC=\frac{\sum \limits_{i=1}^n\left({r}_i\right)-\frac{n_0\times \left({n}_0+1\right)}{2}}{n_0\times {n}_1} $$
(17)

where in (17), n0 and  n1 are the number of patients in the test subset with and without PVST respectively, and are referred to as the number of exemplars respectively in positive and negative class, and n = n0 + n1 is the total number of patients in the test subset, and ri is the rank of the ith patient in descending order of its probability to be a PVST patient. The minimum start rank is set to 1.

From the above metric definitions, we can see that sensitivity expresses the ratio of detecting PVST patients from the true PVST patients, while specificity indicates the ratio of recognizing non-PVST patients from patients without PVST, and precision implies the ratio of the true PVST patients among the recognized PVST patients by our SVM predictive model. F-measure is the harmonic mean of precision and sensitivity.

Statistic test

The statistic test is undertaken between the SVM classifiers built on the risk indexes detected by our RFA-PVST and by the aforementioned very popular feature selection algorithms from [32,33,34,35,36] to verify whether or not our proposed RFA-PVST is statistically significant. That is, the statistic test results can disclose whether or not the risk indicators detected by our RFA-PVST are statistically significant to predict PVST. The Friedman’s test [49, 50] is adopted to discover the significant difference between algorithms for it is considered preferable for comparing algorithms over datasets without any normal distribution assumption. Once the significant difference is detected, the multiple comparison test will be adopted as a post hoc test to detect the significant difference between pairs of algorithms. We’ll do Friedman’s test with α = 0.05 of algorithms in terms of Acc, AUC, sensitivity, specificity, and precision of the SVM predictive models of PVST with same number of risk indexes detected by each algorithm, respectively.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

ALB:

albumin

ALT:

alanine transaminase

APTT:

activated partial thromboplastin time

AST:

aspartate aminotransferase

AUC:

area under ROC curve

BUN:

blood urea nitrogen

BV:

bleeding volume

Ca:

calcium

CHOL:

cholesterol

CRE:

creatinine

DBIL:

direct bilirubin

D-D:

D dimer

FDR:

false discovery rate

FIB:

fibrinogen

FN:

false negative

FNR:

false negative rate

FP:

false positive

FPR:

false positive rate

GLU:

glucose

HBV:

Hepatitis B virus

HCV:

Hepatitis C Virus

HGB:

hemoglobin

INR:

International normalized ratio

K:

Kalium

LLEScore:

Locally Linear Embedding score

LY1:

lymphocyte count of 1st test

LY2:

lymphocyte count of 2nd test

MCC:

matthews correlation coefficient

mRMR:

minimum redundancy maximum relevance

Na:

Natrium

NE1:

neutrophil count of 1st test

NE2:

neutrophil count of 2nd test

PLT:

Platelets

PT:

prothrombin time

PVST:

portal vein system thrombosis

RBC:

Red blood cell

RBF:

radial basis function

RD:

risk degree

RFA-PVST:

risk factor analysis for PVST

ROC:

receiver operating characteristic curve

SVM:

support vector machine

SVM-RFE:

SVM recursive feature elimination

TBIL:

total bilirubin

TN:

true negative

TP:

total protein

TP:

true positive

TT:

thrombin time

VC:

Vapnik-Chervonenkis

WBC:

White blood cell

References

  1. Parikh S, Shah R, Kapoor P. Portal vein thrombosis. Am J Med. 2010;123:111–9.

    Article  PubMed  Google Scholar 

  2. Rattner DW, Ellman L, Warshaw AL. Portal vein thrombosis after elective splenectomy: an underappreciated, potentially lethal syndrome. Arch Surg. 1993;128:565–70.

    Article  CAS  PubMed  Google Scholar 

  3. Stamou KM, Toutouzas KG, Kekis PB, et al. Prospective study of the incidence and risk factors of postsplenectomy thrombosis of the portal, mesenteric, and splenic veins. Arch Surg. 2006;141:663–9.

    Article  PubMed  Google Scholar 

  4. Francoz C, Valla D, Durand F. Portal vein thrombosis, cirrhosis, and liver transplantation. J Hepatol. 2012;57:203–12.

    Article  PubMed  Google Scholar 

  5. Tao YF, Teng F, Wang ZX, et al. Liver transplant recipients with portal vein thrombosis: a single center retrospective study. Hepatob Pancreat Dis. 2009;8:34–9.

    Google Scholar 

  6. Delaitre B, Champault G, Barrat C, et al. Laparoscopic splenectomy for hematologic diseases. Study of 275 cases. French Society of Laparoscopic Surgery. Ann Chir. 2000;125:522–9.

    Article  CAS  PubMed  Google Scholar 

  7. Romano F, Caprotti R, Conti M, et al. Thrombosis of the splenoportal axis after splenectomy. Langenbeck Arch Surg. 2006;391:483–8.

    Article  Google Scholar 

  8. Wu S, Wu Z, Zhang X, et al. The incidence and risk factors of portal vein system thrombosis after splenectomy and pericardial devascularization. Turk J Gastroenterol. 2015;26:423–8.

    Article  PubMed  Google Scholar 

  9. Chawla YK, Bodh V. Portal vein thrombosis. J Clin Exp Hepatol. 2015;5:22–40.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Raja K, Jacob M, Asthana S. Portal vein thrombosis in cirrhosis. J Clin Exp Hepatol. 2014;4:320–31.

    Article  PubMed  Google Scholar 

  11. Jiang GQ, Bai DS, Chen P, et al. Predictors of portal vein system thrombosis after laparoscopic splenectomy and azygoportal disconnection: a retrospective cohort study of 75 consecutive patients with 3-months follow-up. Int J Surg. 2016;30:143–9.

    Article  PubMed  Google Scholar 

  12. Ikeda M, Sekimoto M, Takiguchi S, et al. High incidence of thrombosis of the portal venous system after laparoscopic splenectomy: a prospective study with contrast-enhanced CT scan. Ann Surg. 2005;241:208.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Winslow ER, Brunt LM, Drebin JA, et al. Portal vein thrombosis after splenectomy. Am J Surg. 2002;184:631–5.

    Article  PubMed  Google Scholar 

  14. Soyer T, Ciftci AO, Tanyel FC, et al. Portal vein thrombosis after splenectomy in pediatric hematologic disease: risk factors, clinical features, and outcome. J Pediatr Surg. 2006;41:1899–902.

    Article  PubMed  Google Scholar 

  15. Li MX, Zhang XF, Liu ZW, et al. Risk factors and clinical characteristics of portal vein thrombosis after splenectomy in patients with liver cirrhosis. Hepatob Pancreat Dis. 2013;12:512–9.

    Article  CAS  Google Scholar 

  16. Danno K, Ikeda M, Sekimoto M, et al. Diameter of splenic vein is a risk factor for portal or splenic vein thrombosis after laparoscopic splenectomy. Surgery. 2009;145:457–64.

    Article  PubMed  Google Scholar 

  17. Zocco MA, Di Stasio E, De Cristofaro R, et al. Thrombotic risk factors in patients with liver cirrhosis: correlation with MELD scoring system and portal vein thrombosis development. J Hepatol. 2009;51:682–9.

    Article  PubMed  Google Scholar 

  18. Kinjo N, Kawanaka H, Akahoshi T, et al. Risk factors for portal venous thrombosis after splenectomy in patients with cirrhosis and portal hypertension. Br J Surg. 2010;97:910–6.

    Article  CAS  PubMed  Google Scholar 

  19. Lai W, Lu SC, Li GY, et al. Anticoagulation therapy prevents portal-splenic vein thrombosis after splenectomy with gastroesophageal devascularization. World J Gastroentero. 2012;18:3443.

    Article  CAS  Google Scholar 

  20. Delgado MG, Seijo S, Yepes I, et al. Efficacy and safety of anticoagulation on patients with cirrhosis and portal vein thrombosis. Clin Gastroenterol H. 2012;10:776–83.

    Article  Google Scholar 

  21. Zhang X, Wang Y, Yu M, et al. Effective prevention for portal venous system thrombosis after splenectomy: a meta-analysis. J Laparoendosc Adv S. 2017;27:247–52.

    Article  Google Scholar 

  22. Tripodi A, Primignani M, Chantarangkul V, Dell’Era A, Clerici M, de Franchis R, Colombo M, Mannucci PM. An imbalance of pro- vs anti-coagulation factors in plasma from patients with cirrhosis. Gastroenterology. 2009 Dec;137(6):2105–11.

    Article  CAS  PubMed  Google Scholar 

  23. Tripodi A, Mannucci PM. The coagulopathy of chronic liver disease. N Engl J Med. 2011 Jul 14;365(2):147–56.

    Article  CAS  PubMed  Google Scholar 

  24. Tripodi A. The coagulopathy of chronic liver disease: is there a causal relationship with bleeding? No Eur J Intern Med. 2010 Apr;21(2):65–9.

    Article  PubMed  Google Scholar 

  25. Loffredo L, Pastori D, Farcomeni A, et al. Effects of anticoagulants in patients with cirrhosis and portal vein thrombosis: a systematic review and meta-analysis. Gastroenterology. 2017;153:480–487. e1.

    Article  CAS  PubMed  Google Scholar 

  26. Mancuso A. Classification of portal vein thrombosis in cirrhosis. Gastroenterology. 2017;152:1247.

    Article  PubMed  Google Scholar 

  27. Qi X, Valla DC, Guo X. Anticoagulation for portal vein thrombosis in cirrhosis: selection of appropriate patients. Gastroenterology. 2018;154:760–1.

    Article  PubMed  Google Scholar 

  28. Chen H, Lv Y, Han G. Anticoagulation for portal vein thrombosis in liver cirrhosis: not only Recanalize the portal vein. Gastroenterology. 2018;154:758.

    Article  PubMed  Google Scholar 

  29. Mancuso A, Politi F, Maringhini A. Portal vein Thromboses in cirrhosis: to treat or not to treat? Gastroenterology. 2018;154:758.

    Article  PubMed  Google Scholar 

  30. Wood CP, Rowe IA. What are the benefits of anticoagulation for portal vein thrombosis in individuals with cirrhosis? Gastroenterology. 2018;154:759–60.

    Article  PubMed  Google Scholar 

  31. Zhang N, Yao Y, Xue W, Wu S. Early prophylactic anticoagulation for portal vein system thrombosis after splenectomy: a systematic review and meta-analysis. Biomed Rep. 2016 Oct;5(4):483–90.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE T Pattern Anal. 2005;27:1226–38.

    Article  Google Scholar 

  33. Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.

    Article  Google Scholar 

  34. Kira K, Rendell LA. The feature selection problem: traditional methods and a new algorithm. Proceedings of the tenth national conference on Artificial intelligence. AAAI Press. 1992:129–34.

  35. Xie JY, Gao HC. A stable gene subset selection algorithm for cancers. LNCS. 2015;9085:111–22.

    Google Scholar 

  36. Li JG, Pang ZN, Su L, et al. Feature selection method LLE score used for tumor gene expressive data. J Beijing Univ Technol. 2015;41:1145–50.

    Google Scholar 

  37. He S, He F. Predictive model of portal venous system thrombosis in cirrhotic portal hypertensive patients after splenectomy. Int J Clin Exp Med. 2015;8:4236.

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Vapnik V. The nature of statistical learning theory. Springer Science & Business Media: New York; 1999.

    Google Scholar 

  39. Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl. 2009;36:3240–7.

    Article  Google Scholar 

  40. Xie JY, Wang CX. Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst Appl. 2011;38:5809–15.

    Article  Google Scholar 

  41. Chang Y, Kim N, Lee Y, et al. Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection. Comput Biol Med. 2012;42:1157–64.

    Article  PubMed  Google Scholar 

  42. Gabere MN, Hussein MA, Aziz MA. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer. OncoTargets Ther. 2016;9:3313.

    CAS  Google Scholar 

  43. Fu KS, Min PJ, Li TJ. Feature selection in pattern recognition. IEEE T Syst Sci Cyb. 1970;6:33–9.

    Article  Google Scholar 

  44. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–82.

    Google Scholar 

  45. Xie JY, Wang MZ, Zhou Y, et al. Coordinating discernibility and independence scores of variables in a 2D space for efficient and accurate feature selection. LNAI. 2016;9773:116–27.

    Google Scholar 

  46. Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM T Intel Syst Tec. 2011;2:27.

    Google Scholar 

  47. Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27:861–74.

    Article  Google Scholar 

  48. Wang R, Tang K. Feature selection for maximizing the area under the ROC curve. Data Mining Workshops, 2009. ICDMW’09. IEEE international conference on. IEEE 2009:400–405.

  49. Borg A, Lavesson N, Boeva V. Comparison of clustering approaches for gene expression data. In: Proceedings of the SCAI, 2013:55–64.

  50. Xie JY, Gao HC, Xie WX, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inform Sci. 2016;354:19–40.

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 22, 2019: Decipher computational analytics in digital health and precision medicine. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-22 .

Funding

This study is supported in part by the National Natural Science Foundation of China under Grant No. 61673251 and 81373157, is also supported by the National Key Research and Development Program of China under Grant No. 2016YFC0901900, and by the Fundamental Research Funds for the Central Universities under Grant No. GK201701006, and by the Scientific and Technological Achievements Transformation and Cultivation Funds of Shaanxi Normal University under Grant No. GK201806013, and by the Innovation Funds of Graduate Programs at Shaanxi Normal University under Grant No. 2015CXS028, 2016CSY009 and 2018TS078 as well.

The funding bodies of the aforementioned funds support authors of this paper to do this study, and guarantee the validation of the design of this study, and collection, analysis, and interpretation of data and writing the manuscript, and also support authors to publish this study results by covering the publication fee of this paper.

Publication costs of this article are funded by the National Natural Science Foundation of China under Grant No. 61673251.

Author information

Authors and Affiliations

Authors

Contributions

J. Xie is the main supervisor and principal investigator of this study. She proposed RFA-PVST algorithm to detect the clinic risk factors of PVST for splenectomy and cardia devascularization patients for cirrhosis and portal hypertension, and supervised M. Wang to design and implement related experiments, and did analysis to experimental results and wrote the manuscript and revised it. M. Wang designed and implemented all algorithms in this study and finished the experimental results. L. Ding & M. Xu collected the data for this study, and M. Xu also took part in the discussion of the experimental results. S. Wu supervised L. Ding and M. Xu to collect the data, and took part in the discussion of the experimental results with J. Xie and M. Wang, and wrote the background of the manuscript. Y. Yao supervised L. Ding and M. Xu to collect data. Q. Liu supervised L. Ding and M. Xu to collect data, and took part in the discussion of the experimental results. S. Xu took part in the discussion of the experimental results and revised the manuscript. All authors have read, accepted, and approved the final version of the manuscript.

Corresponding authors

Correspondence to Juanying Xie, Shengli Wu, Shengquan Xu or Qingguang Liu.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Εthics Committee of the First Affiliated Hospital of Xi’an Jiaotong University. The hospital is the first level hospital in PR China. All experiments were performed in accordance with the principles of the Declaration of Helsinki. All participants provided their written informed consent to participate in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, M., Ding, L., Xu, M. et al. A novel method detecting the key clinic factors of portal vein system thrombosis of splenectomy & cardia devascularization patients for cirrhosis & portal hypertension. BMC Bioinformatics 20 (Suppl 22), 720 (2019). https://doi.org/10.1186/s12859-019-3233-3

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s12859-019-3233-3

Keywords