Next Article in Journal
Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
Previous Article in Journal
Phase-Coherent Dynamics of Quantum Devices with Local Interactions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Method to Rank Influential Nodes in Complex Networks Based on Tsallis Entropy

1
School of Computer Science and Engineering, Central South University, Changsha 470075, China
2
School of Information Technology and Management, Hunan University of Finance and Economics, Changsha 410205, China
3
School of Engineering and Built Environment, Glasgow Caledonian University, Glasgow G4 0BA, UK
*
Authors to whom correspondence should be addressed.
Entropy 2020, 22(8), 848; https://doi.org/10.3390/e22080848
Submission received: 27 June 2020 / Revised: 24 July 2020 / Accepted: 25 July 2020 / Published: 31 July 2020

Abstract

:
With the rapid development of social networks, it has become extremely important to evaluate the propagation capabilities of the nodes in a network. Related research has wide applications, such as in network monitoring and rumor control. However, the current research on the propagation ability of network nodes is mostly based on the analysis of the degree of nodes. The method is simple, but the effectiveness needs to be improved. Based on this problem, this paper proposes a method that is based on Tsallis entropy to detect the propagation ability of network nodes. This method comprehensively considers the relationship between a node’s Tsallis entropy and its neighbors, employs the Tsallis entropy method to construct the TsallisRank algorithm, and uses the SIR (Susceptible, Infectious, Recovered) model for verifying the correctness of the algorithm. The experimental results show that, in a real network, this method can effectively and accurately evaluate the propagation ability of network nodes.

1. Introduction

With the expansion of the Internet, people are paying increasingly more attention to social networks (WeChat, Facebook, and Instagram). When analyzing social networks, it becomes more important to mine influential nodes. For example, the collaborator network [1] analyzes academic research to distinguish the different academic influences of different authors, thus providing researchers with scientific evidence, especially those who are not familiar with a certain field, so that they can quickly enter the field. Furthermore, it plays an important supporting role in promoting the exchange of science and technology. Sentiment analysis or opinion mining [2] uses natural language processing tools in order to extract subjective information from text to assess the attitudes of some users, provide enterprises with product promotion channels, understand user psychology, and obtain market information, which has important reference significance. Online advertising [3] can select the most influential users (online celebrities) to specifically show the users with brand affinity. It can be used for product recommendation, and it can use the celebrity effect to continuously expose consumers to products, which is of great significance for product marketing. In research, influential nodes are considered to have better communication capabilities, which mean that they can disseminate information to more network users; therefore, identifying influential nodes is an important factor in the successful dissemination of information in social networks.
In the research of node influence in complex networks, the earliest method is based on the degree of nodes, such as the degree centrality that is based on centrality [4], and they all use the network locations of nodes to evaluate the node influence. These methods mainly evaluate the importance of nodes based on the number and relative distribution of connected edges, which is simple and effective; however, the degree of nodes is based on the local information method and, thus, the influence and function of nodes in the whole network are not effectively described. Furthermore, the importance of complex network nodes also depends on the network structure around them. In the study of the structural complexity of complex networks, some scholars have made many related researches on the structural characteristics of networks, such as the closeness centrality [5], betweenness centrality [6], eigenvector centrality [7], Katz centrality [8], and entropy. Entropy is an important method for evaluating the characteristics of the network structure. When entropy is used to evaluate a complex network, the more orderly the structure of the complex network is, the smaller the entropy value, and vice versa. At the same time, entropy can also be used to describe the complexity of the overall network structure and the statistical characteristics of a complex network.
For example, in 2012, Chen et al. [9] proposed the structural entropy to measure the structural characteristics of complex networks. Xu et al. [10] proposed a path entropy-based approach to link predictions in real networks. Qiao et al. [11] proposed a new mechanism for quantitatively measuring centrality based on a graph decomposition and domain node entropy redefining entropy centrality model. However, these traditional methods lack the ability to capture the global information of nodes, and they seldom consider the locations of nodes in the network. To solve these problems, this paper proposes a novel method for evaluating the propagation ability of the nodes in a network: TsallisRank method (TRank). This method combines a node’s propagation ability and the degree of a node’s neighbors, fully measures the correlation between the primary and secondary neighbors of a node, and uses the Tsallis entropy in order to evaluate the complexity of the network structure.
The rest of this paper is organized, as follows. The Section 2 outlines the related work that influences this study, and the Section 3 introduces the motivation and details of the method in detail. The Section 4 gives the details of the experimental results and evaluation results, and the experimental conclusions are written in the Section 5.

2. Related Works

So far, many scholars have put forward many measures and methods in the research of the influence of complex networks [12,13]. Among them, there are influence analysis methods based on a node’s own attributes, mainly including the evaluation method that is based on the degree of nodes, the K-shell decomposition method, and so on. The degree centrality analysis method was proposed by Bonacich P [14]. It mainly considers the size of the degree of a node. The larger the degree is, the greater its influence. Kitsak et al. [15] proposed a fast node ranking method, called K-shell decomposition, which considers the network locations of nodes when determining the influence ability. Bae & Kim et al. [16] considered the degree of nodes and the core of their influence, which is also more concise. Zeng et al. [17] proposed a new method that is based on K-shell decomposition and mixed degree decomposition (MDD), in which MDD weighed the remaining degree and the reduced degree of nodes after K-shell decomposition. On the basis of K-shell, Wang et al. [18] considered not only the K value after node decomposition, but also the number of iterations each time.
Another method is the ranking method based on the centrality of eigenvectors, which considers the quantity and quality of adjacent nodes at the same time. The main method makes some improvements that are based on the PageRank and hits algorithm. PageRank algorithm defines the influence propagation of nodes as the important score propagation. In the initial state of iteration, each node in the network distributes its own PageRank value equally for the nodes to which it points, update the PR value of each node until the algorithm converges, and finally determine the importance of the nodes according to the final PR value. Weng, Lim, et al. [19] proposed a twitter rank algorithm based on PageRank, which is used to measure the topic similarity between users and the impact of the link structure. Chen et al. [20] analyzed the three aspects of post quality, the proportion of forwarding behavior, and interest similarity; calculated the relative impact of forwarding behavior; and, improved PageRank with the unique structural and behavioral characteristics of a microblog network. Wang et al. [21] proposed a consistency algorithm called ConformRank to find the most influential users. Emotion integration refers to how users maintain the same emotion as the original users. The consistency weight evaluates the consistency of user emotion.
In addition, entropy is an effective tool to describe the complexity and uncertainty of the social impact, and so it has been widely used in social networks. Peng [22] proposed two concepts, the friend entropy and the interaction frequency entropy, in order to measure the social impact. Sathanur and Jandhyala et al. [23] introduced the transfer entropy to measure the impact of directed causality. Yin L and Deng y [24] used heuristic rules to measure the utility of each neighbor in the network and the Shannon entropy to measure the uncertainty of each node. Xiao et al. [25] proposed a new structural entropy based on the automorphism partition to accurately quantify the heterogeneity or disorder of a network system. Nie t et al. [26] considered the local information of the correlation between each node and its neighboring nodes to propose the mapping entropy.

3. Motivation and Proposed Approach

This section mainly explains the origin and algorithm flow of the Tsallis entropy algorithm, in which the final algorithm and its flow is derived step by step.

3.1. Tsallis Entropy

Entropy is a concept in physics. Entropy connects a microstate with a macro characteristic and uncertainty with information measurement, and it measures order and disorder. In 1988, the Brazilian physicist Tsallis [27] proposed the Tsallis entropy that is based on the existing Boltzmann entropy. Its formula is as follows.
S q = k 1 i = 1 W P i q q 1 ( q ϵ R )
where S q is the value of the Tsallis entropy, W is the number of particles in a micro system, k is the Boltzmann constant, q is the Tsallis parameter that describes the interaction between elements, and p i represents the probability of occurrence of microparticles. In this paper, the Tsallis entropy is used for detecting the propagation ability of complex network nodes based on the Tsallis entropy formula in order to measure the complexity of a network structure. The formula is as follows.
T i = j = 1 W p i j q i j p i j 1 q i j
where T i represents the entropy value of node i in the local area network; and node i and the nodes directly connected to this node constitute a network with a radius of 1, which is called the local area network of node i . p i j represents the probability set of neighbor j around node i in the local area network, W is the number of nodes in the local area network, and q i j represents the system parameters of a neighbor j of node i . When calculating the propagation ability of a complex network, this paper uses the closeness centrality to represent the interaction parameters of nodes and the system parameters, which can improve the overall effect of nodes in the network. It is reasonable to evaluate the structural complexity of complex networks.

3.2. TsallisRank

In the research of influence in complex networks, many methods are based on the degree of nodes. However, only depending on the degree cannot fully measure the influence of a node. If the degree of a node’s neighbors can be considered, it may improve the accuracy of the influence of the whole node. For example, node 4 and node 8 have the same degree centrality of 6, and they will have the same propagation ability, as shown in Figure 1. However, the two neighbors of node 8 are node 11 and node 12, both of which have no neighbors; therefore, the propagation ability of node 8 should be smaller than that of node 4 and, thus, the propagation ability of node 8 will be different. Therefore, we think that the propagation ability of a node is positively related to its neighboring nodes’ degree, and so we propose the TsallisRank algorithm.
The TsallisRank algorithm that is based on the Tsallis entropy is mainly divided into two parts. The first part is the calculation of all kinds of parameters to prepare the following formula. First of all, calculate the compactness centrality for each node, and then use the compactness centrality to calculate the Tsallis parameter q , and then build a local area network. Each node calculates the first-order neighbor and second-order neighbor probability set. Finally, the Tsallis parameters and probability sets are used to calculate the first-order neighbor entropy and the second-order neighbor entropy, respectively. In the second part, the purpose is to integrate the two kinds of neighbor entropies, calculate the propagation ability, and then calculate the final TsallisRank through the two neighborhood cores. Please refer to Figure 2 for the specific steps.

3.2.1. Parameter Computing

  • Calculate compactness centrality
For a network, we define G = ( V , E ) as the connected graph,   n = | V | as the number of nodes, m = | E | as the number of edges,   d ( i , j ) as the shortest path between node i and node j , and C i as the tight centrality of node i . It is defined, as follows.
C i = n 1 j i d ( i , j )
  • Calculate the Tsallis parameters
Kitsak et al. [15] believe that the influence ability of a node is determined by its network location. Therefore, the most influential nodes will maintain closer relationships with their surrounding nodes. q i represents the Tsallis parameter of node i , and q i is defined, as follows, where C m a x is the maximum value of the tight centrality in the network.
q i = 1 + C m a x C i
  • Calculating the probability set
First of all, we need to build a node local area network, which is called node i ’s local area network. The degree of node i is represented by k i , N i is the set of its neighbors, k i 1 is the sum of the degrees of all the neighbors of node v i , and k i 1 = v j N i k j . Subsequently, k i 2 is the sum of the degrees of the neighbors of node v i , which is called the second-class neighbor in this paper, and k i 2 = v j N i k j 1 .
p i 1 = k j k i 1
p i 2 = k j 1 k i 2
where p i 1 is defined as the first-order probability set of node v i , and p i 2 is the second-order probability set of node v i .
  • Neighbor entropy
According to the inference of Equation (3), this paper uses the Tsallis value and probability set obtained above to replace Equation (3), and then formula 5 and formula 6 are obtained.
T s 1 ( v i ) = v j N i ( p j 1 ) q i ( p j 1 ) 1 q i
T s 2 ( v i ) = v j N i ( p j 2 ) q i ( p j 2 ) 1 q i
where T s 1 ( v i ) is the first-order neighbor entropy of node v i and T s 2 ( v i ) is the second-order neighbor entropy of node v i .

3.2.2. Coreness Centrality

  • Ability to calculate the impact
The coefficient α i is defined in this paper in order to integrate the first-order neighbor entropy and the second-order neighbor entropy. It is a ratio that combines the two entropy values organically.
I C ( v i ) = T s 1 ( v i ) + α i T s 2 ( v i )
α i = k i 2 m a x v h V ( k h 2 )
where I C ( v i ) represents the influence ability of node v i , which describes the mutual influence ability of the primary and secondary neighbors of node v i ; and, m a x v h V ( k h 2 ) represents the maximum value of the sum of the degrees of the secondary neighbors of a node in a network. The value field of α i is 0 < α i < 1 .
  • Computing the neighborhood core
Bae and Kim [16] put forward the concept of the neighborhood kernel in this paper when improving the K-shell algorithm. This paper will draw on this concept and it uses the following equation.
C n c ( v i ) = v j N i I C ( v j )
The meaning of Cnc is that for node v i , the   I C value of all its neighbors can be summed to get the neighborhood core Cnc (core neighborhood centrality) of node v i .
  • TsallisRank
T R a n k ( v i ) = C n c + ( v i ) = v j N i C n c ( v j )
where TRank is the abbreviation of TsallisRank, which will be used in place of TsallisRank. For node v i , by summing the Cnc of all its neighbors, we can get the extended core neighborhood of node v i . In this paper, we set the TRank equal to C n c + , and finally we get the TRank.

3.3. Algorithm Description

According to the above formula explanation, in order to further understand the TRank algorithm, this paper gives the pseudo code as shown in Algorithm 1.
Algorithm 1: TRank algorithm.
Input: Network G(V,E)
Output: TRank Value for each node
1. Find neighboring nodes N i of node v i
2. Compute q i for node v i
3. For node v j in N i do
4. compute ratio1 = degree ( v j )/sum(degree(all neighbors of v j ))
5.   T s 1 = (pow(ratio1,   q i ) − ratio1)/(1- q i )
6. End For
7. For node v j in N i do
8. compute second_neighbor_degree= the degree of the second neighbor for node v j
9. compute ratio2 = sum(degree(all neighbors of v j ))/sum(second_neighbor_degree( v j ))
10.   T s 2 = (pow(ratio2,   q i ) − ratio2)/(1- q i )
11. End For
12. compute I C ( v i ) = T s 1 ( v i ) + α i T s 2 ( v i )  
13. For node v j in N i do
14. S I ( v i )   = sum( I C ( v j ) )
15. End For
16. For node v j in N i do
17.   T R a n k ( v i )   = sum(SI ( v j ) )
18. End For
In this algorithm, lines 3 to 11 are the core, and lines 3 to 6, respectively, calculate T s 1 for each node. The time complexity is O ( n k ) . n represents the number of nodes, and k is related to the number of neighbors. The T s 2 of each node is calculated in lines 7 to 11. The time complexity is O ( n k m ), where m is related to the number of secondary neighbors of the node. Therefore, the overall complexity of the algorithm is O ( n k m ).

4. Experiment

In this section, we will evaluate the comprehensive ability of TRank from three aspects: identification, correctness, and efficiency. At the same time, we will use the infectious disease model to simulate the process of information transmission in the real network, so as to better evaluate the transmission ability of nodes.

4.1. Network Datasets

In this paper, six random synthetic Barabasi Albert (BA) scale-free networks [28] of different sizes, four random synthetic Fractional Preferential Attachment (FPA) scale-free networks [29] of different ‘f’ parameter, and 10 real networks of different sizes are selected. Table 1 shows the analysis data of 10 random synthetic scale-free networks, and Table 2 shows the analysis data of 10 real networks, including the number of nodes, the number of edges, the average degree, the maximum degree, the assortativity, and the clustering coefficient.
Some of these network datasets are detailed below.
(1) BA network is a scale-free network and whose degree distribution follows a power law, it certainly contains few nodes with unusually high degree as compared to the other nodes of the network. We set the number of nodes and average degree of BA model to synthesize six random networks of different sizes.
(2) FPA network is a generalization of BA network. When compared with the BA network, FPA network is acyclic. The element controlling the FPA model properties is the ‘f’ parameter (where f ∈ (0,1). For f = 1, FPA model implements the classical BA model). We set the f parameter of FPA model to synthesize four networks of the same size.
(3) The Karate network has 34 members of a karate club. After more than two years of continuous time, Zachary calculated 78 sides to represent their relationships according to the level of interpersonal communication. Because of the conflict between the instructor and the manager at some time, their relationships broke down, resulting in two factions.
(4) The Dolphin data set has 62 nodes, representing dolphins from two families. It took more than seven years of continuous observation to form the data set. Lusseau et al. counted the degree of interaction between each pair of dolphins and used 159 edges to describe the relationships between them.
(5) The Jazz dataset has 198 nodes, each of which is a jazz musician, and the edges represent two musicians playing together in a band.
(6) Elegans represents the metabolic network of Caenorhabditis elegans. The metabolic network is composed of nodes and substrates. These nodes and substrates are connected by links, which are the actual metabolic responses.
(7) The Email dataset represents the email communication network of Rovira I Virgili University in Taragona, southern Catalonia, Spain. Each node is a user, and each edge indicates that at least one email has been sent.
(8) Euroroad is the international electronic road network, which is mainly located in Europe. The network is undirected. Each node represents a city, and the edge between two nodes represents that they are connected by an E-road.
(9) The East data set describes the interaction network composed of proteins, which can be used to discover the interaction among thousands of proteins. It is very important for biology to recognize the correlation of large-scale data sets.
(10) The Hamsterster network contains the friendships and family links between the users of the website.
(11) Powergrid is an undirected network that contains information about the western power grid of the United States of America. The connection between two points represents a power line, and a node can be a generator, transformer or substation.
(12) PGP is the pretty good privacy (PGP) algorithm user interaction network.

4.2. TsallisRank Algorithm Recognition Analysis

This paper will use the degree centrality (DC) [4], K-shell (KS) [15], local entropy (LE) [30], mixed degree decision (MDD) [16], and extended neighborhood core centrality (Cnc+) [17] as the comparison metrics in order to better evaluate the rank algorithm.
This part of the experiment mainly verifies the ability of the algorithm to identify the influential nodes in the network, among which the verification methods are the D method, the CCDF method, and the M method.
  • D method
D = n u m b e r   o f   d i s t i n c t   r a n k s   n
where n represents the number of network nodes. The maximum value of function D is 1. It means that, in the network, each node has a unique influence ability, and each node can be effectively distinguished. The minimum value of function D is 1 / n , which means that all the nodes have the same influence ability. At this time, the recognition ability of the algorithm is the worst. In this paper, the D method is applied to each algorithm, which can effectively distinguish the recognition ability of each method.
  • CCDF method
CCDF ( r ) = n i = 1 r n i n
where n represents the total number of network nodes and n i represents the number of nodes occupied by rank variable r in a ranking list. With the increase of rank r , the functional value falls faster, and the ranking distribution performance worsens. The CCDF (comprehensive cumulative distribution function) method seeks the ranking distribution of different methods, and the ranking variable r determines the value of the function.
  • M method
M ( R ) = ( 1 r R n r ( n r 1 ) n ( n 1 ) ) 2
where n is the number of different rankings in the R ranking list, and n r is the number of nodes occupying the same ranking R. If all nodes have the same ranking, the value of M is 0. If all nodes have different rankings, the closer the value of function M is to 1, the better the recognition of this ranking list.
  • Jaccard similarity coefficient
J c ( X , Y ) = | X ( c )   Y ( c ) | | X ( c )   Y ( c ) |
where the Jaccard similarity coefficient is used to determine the degree of similarity of two rankings. In list X , X ( c ) represents the set of the first c rankings. The closer the value of J c is to 1, the more similar the two rankings are. In addition, it also verifies the high accuracy of ranking R.
Experiment 1:
Experiment 1 is mainly to verify the recognition ability of the algorithm in random synthetic scale-free networks with D method and M method. We can see that some central methods do not perform well, as shown in Figure 3. For example, DC, MDD and KS have lower M and D values in all networks. In BA networks, Le, Cnc+ and TRank perform best. KS performance is the worst, and as the number of nodes in BA network decreases, the performance gets worse. In FPA networks, TRank performs best. Although the M values of Le and Cnc+ are very high, the D values are low.
Experiment 2:
Experiment 2 will use the D method to verify the recognition ability of the algorithm in real networks. D (x) shows the functional D value of method X for different datasets, as shown in Table 3. DC, MDD, and KS do not perform well, similar to experiment 1. In addition, Cnc+ is highly recognizable in some networks, and LE only performs better than TRank in the karate network.
Experiment 3:
Experiment 3 takes ranking as the abscissa and the number of nodes in each ranking as the frequency so that the degree of recognition of different methods can be more clearly seen in order to view the frequencies of the nodes in each ranking. The closer the frequency of nodes is to 1, the better the recognition ability of the ranking method. As shown in Figure 4, in the four real networks of Karate, Dolphin, Jazz, and Elegans, the frequencies of the ranking nodes of DC, KS, and MDD are scattered above the frequency of 1 while those of LE and TRank is always around the frequency of 1.
Experiment 4:
Experiment 4 will explore the ranking distributions of different methods. It uses the CCDF to draw the distributions of the four networks, including Karate, Dolphin, Jazz, and Elegans, using different algorithms, as shown in Figure 5. From Figure 5, we can see that DC, KS, and MDD fall rapidly in the four networks. In the small Karate network, the performance is very good. Cnc+ is very close to the TRank, but the TRank still decreases at a slower rate.
Experiment 5:
Table 4 shows the ranking lists of the M method applied to different methods for the 10 real networks. In this table, M(x) shows the M values of the function for different datasets. As can be seen from Table 4, LE, Cnc+, and bank all have extremely high scores. DC, KS, and MDD perform poorly on multiple networks.

4.3. Algorithm Correctness

This paper uses the SIR [31] model to obtain the propagation impacts of network nodes in order to verify the correctness of the ranking method. In the simulation process [32], at the beginning of this paper, node v is initialized as the infected state, and other nodes are set to vulnerable states. In each iteration, the infected node tries to infect all the neighboring nodes in its vulnerable state with probability β. Subsequently, it changes to the recovered state by itself, and repeats this process until no node in the network is in the infected state. At the end of the infection process, the number of recovered nodes is regarded as the propagation ability of node v . In the infectious disease model, β is set to float near the infection threshold β th   ~   k / k 2 , where k / k 2 represents the average degrees of the first level neighbor and the second level neighbor, respectively. Because of the randomness of the iterative process of the disease model, this paper decided to simulate the process repeatedly for each node and then take the average value. The simulations will follow the following rules: for networks | V | <100, the simulation is iterated 10 4 times; for 100< | V | < 10 4 , the simulation is iterated 10 3 times; and, for   | V | > 10 4 , the simulation is iterated 100 times.
At the end of the SIR simulations, the σ ranking is obtained, and the correlation coefficient of Kendall’s tau [33] is compared with the R ranking calculated by each algorithm. In order to quantify the correctness of the different methods, it is assumed that ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) is a group of rankings of lists X and Y, respectively. For any pair of ( x i , y i ) and ( x j , y j ) , if ( x i > x j and y i > y j ) or ( x i < x j and y i < y j ) is satisfied, it shows that they are consistent. If ( x i < x j and y i > y j ) or ( x i > x j and y i < y j ), it demonstrates that they are inconsistent. If x i = x j and y i = y j , they are both inconsistent. When considering these relations, Kendall’s tau τ rank correlation coefficient is defined as Equation (15).   n c and n d represent the numbers of agreements (c) and disagreements (d), respectively. n is the size of the rank list.
τ ( R , σ ) = n c n d 0.5 n ( n 1 )
Experiment 6:
Experiment 6 will calculate the correlation coefficient τ between the ranking lists of different methods and the σ ranking obtained in the infectious disease model. Table 5 presents the specific results. In this table, β th is the threshold β of the actual infection probability. τ ( x , σ ) column shows Kendall’s tau correlation coefficient of methods x and σ. It can be seen from the table that compared with other methods, the rank r calculated by TRank is extremely correlated with σ. Only in the karate and PGP networks, where Cnc+ exceeds TRank, is it the most correlated with σ. DC, KS, and LE have low correlations with σ.
Experiment 7:
In this experiment, the Jaccard similarity coefficient will be used to determine the degree of similarity of two rankings. In list x, X ( c ) represents the set of the first C rankings. The smaller that J c is, the more similar the two rankings. in addition, the higher accuracy of ranking R is verified. Figure 6 shows the four networks of Email, Euroroad, Yeast, and Hamsterster. For networks with less than 200 network nodes, the maximum value of ranking variable C is the number of network nodes. For networks with more than 200 network nodes, the maximum value of ranking variable C is 200. From the experimental results, TRank has good performance in the four networks; Cnc+ only has similar performance with TRank in the Hamsterster network; and the rest of the algorithms, such as DC, KS, Le, and MDD, show slow upward trends at the beginning, and finally remain stable at the bottom of the TRank curve.
Experiment 8:
In this paper, β is used as a variable to carry out the SIR simulation in the Karate, Euroroad, Elegans, and Yeast networks, and different σ lists are obtained. The Kendall’s tau correlation coefficient rankings of each method and different σ lists are calculated. The results of the experiment are shown in Figure 7. According to Figure 7, in the Dolphin and Euroroad networks, as β increases, the correlation between various algorithms and σ shows a downward trend. However, in the Elegans and Yeast networks, the curves of LE, DC, MDD, and KS first decline and then rise, and only the curves of Cnc+ and TRank first rise and then fall.
Experiment 9:
In order to simulate the infection process more realistically, we modify the SIR model by adding a natural decay function: β t = β 0 e t , where β 0 is the initial value of infection probability, t is the step of iterations. The infection probability of each iteration decreases gradually. Modified SIR simulation is carried out in Dolphins and Jazz network, and the process are the same as Experiment 8. The results of the experiment are shown in Figure 8. With the increase of β 0 , the correlation of Le, DC, MDD, and KS in the two networks show a downward trend, while Cnc+ gradually decreases in Dolphins network and increases gradually in Jazz network. The correlation of TRank is the highest and increases gradually in both networks.

4.4. Algorithm Efficiency

Experiment 10:
In this part, we will look at the time consumed by each algorithm in different networks. The experimental environments are as follows: python = 3.6, numpy = 1.16, and pandas = 0.24. In nine real networks, the time consumption of the 6 algorithms are quite different, among which the DC, KS, and Cnc+ based on degrees are relatively simple, and so their time consumptions are very small and remain stable, as shown in Figure 9. However, slightly complex algorithms, such as MDD, LE, and TRank, will take longer.

5. Discussion and Conclusions

In this paper, we propose an effective ranking algorithm TsallisRank, which solves the problem that the traditional method lacks the ability to capture the global information of nodes. In addition, this method considers the positions of nodes in the network. In this method, we consider the influence of the numbers of primary neighbors and secondary neighbors on a node’s propagation ability. Furthermore, we use Tsallis in order to evaluate the characteristics of the network structure, which can better evaluate the influential nodes in the network. By simulating the SIR infection process using real networks, the diffusion ability of each node in the network is obtained, and then the ranking list of the ranking methods is obtained. Kendall’s tau correlation coefficient analysis is carried out, and it is found that TRank can effectively rank the affected nodes; when compared with other methods, such as DC, KS, MDD, Cnc, and LE, TRank is more accurate and effective. However, in terms of time consumption, the performance of TRank is not outstanding, so it needs to be optimized in a follow-up work. Compared with DC, the TRank algorithm is more complex, which leads to a great increase in the computing time, which is also a limitation to the algorithm.

Author Contributions

Resources, S.L.; Software, Y.Z.; Writing—original draft, J.Z.; Writing—review & editing, X.C. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

The works that are described in this paper are supported by NSF 61802120, Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and Technology (Hunan University of Finance and Economics) 2017TP1025 and HNNSF 2019JJ50018, The scientific research project of Hunan Provincial Education Department No.: 18B480.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, N.; Gillet, D. Identifying Influential Scholars in Academic Social Media Platforms. IEEE Comput. Soc. 2013, 608–614. [Google Scholar] [CrossRef] [Green Version]
  2. Li, D.; Shuai, X.; Sun, G.; Tang, J.; Ding, Y.; Luo, Z. Mining topic-level opinion influence in microblog. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management; ACM: New York, NY, USA, 2012. [Google Scholar]
  3. Sun, J.; Tang, J. A Survey of Models and Algorithms for Social Influence Analysis. In Social Network Data Analytics; Springer: Boston, MA, USA, 2011. [Google Scholar]
  4. Freeman, L.C. Centrality in Social Networks’ Conceptual Clarification. Soc. Netw. 1979, 1, 215–239. [Google Scholar] [CrossRef] [Green Version]
  5. Sabidussi, G. The Centrality Index of a Graph. Psychometrika 1966, 31, 581–603. [Google Scholar] [CrossRef]
  6. Freeman, L.C. A Set of Measures of Centrality Based on Betweenness. Sociometry 1977, 40, 35–41. [Google Scholar] [CrossRef]
  7. Bonacich, P.; Lloyd, P. Eigenvector-Like Measures of Centrality for Asymmetric Relations. Soc. Netw. 2001, 23, 191–201. [Google Scholar] [CrossRef]
  8. Katz, L. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–43. [Google Scholar] [CrossRef]
  9. Chen, D.; Lü, L.; Shang, M.S.; Zhang, Y.C.; Zhou, T. Identifying influential nodes in complex networks. Physica A 2012, 391, 1777–1787. [Google Scholar] [CrossRef] [Green Version]
  10. Xu, Z.; Pu, C.; Yang, J. Link prediction based on path entropy. Physica A 2016, 456, 294–301. [Google Scholar] [CrossRef] [Green Version]
  11. Qiao, T.; Shan, W.; Zhou, C. How to identify the most powerful node in complex networks? A novel entropy centrality approach. Entropy 2017, 19, 614. [Google Scholar] [CrossRef] [Green Version]
  12. Liao, Z.; He, D.; Chen, Z.; Fan, X.; Zhang, Y.; Liu, S. Exploring the characteristics of issue-related behaviors in github using visualization techniques. IEEE Access 2018, 6, 24003–24015. [Google Scholar] [CrossRef]
  13. Liao, Z.; Zhao, B.; Liu, S.; Jin, H.; He, D.; Yang, L.; Zhang, Y.; Wu, J. A prediction model of the project life-span in open source software ecosystem. Mob. Netw. Appl. 2019, 24, 1382–1391. [Google Scholar] [CrossRef] [Green Version]
  14. Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 1972, 2, 113–120. [Google Scholar] [CrossRef]
  15. Kitsak, M.; Gallos, L.K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H.E.; Makse, H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010, 6, 888–893. [Google Scholar] [CrossRef] [Green Version]
  16. Bae, J.; Kim, S. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A 2014, 395, 549–559. [Google Scholar] [CrossRef]
  17. Zeng, A.; Zhang, C.J. Ranking spreaders by decomposing complex networks. Phys. Lett. A 2013, 377, 1031–1035. [Google Scholar] [CrossRef] [Green Version]
  18. Wang, Z.; Zhao, Y.; Xi, J.; Du, C. Fast ranking influential nodes in complex networks using a k-shell iteration factor. Physica A 2016, 461, 171–181. [Google Scholar] [CrossRef]
  19. Weng, J.; Lim, E.P.; Jiang, J.; He, Q. Twitterrank: Finding topic-sensitive influential twitterers. In Proceedings of the Third ACM International Conference on Web Search and Data Mining; ACM: New York, NY, USA, 2010. [Google Scholar]
  20. Chen, W.; Cheng, S.; He, X.; Jiang, F. Influencerank: An efficient social influence measurement for millions of users in microblog. In 2012 Second International Conference on Cloud and Green Computing; IEEE: Piscataway, NJ, USA, 2012; pp. 563–570. [Google Scholar]
  21. Wang, Q.; Jin, Y.; Cheng, S.; Yang, T. ConformRank: A conformity-based rank for finding top-k influential users. Physica A 2017, 474, 39–48. [Google Scholar] [CrossRef]
  22. Peng, S.; Li, J.; Yang, A. Entropy-based social influence evaluation in mobile social networks. In International Conference on Algorithms and Architectures for Parallel Processing; Springer: Berlin/Heidelberger, Germany, 2015; pp. 637–647. [Google Scholar]
  23. Sathanur, A.V.; Jandhyala, V. An activity-based information-theoretic annotation of social graphs. In Proceedings of the 2014 ACM Conference on Web Science; ACM: New York, NY, USA, 2014; pp. 187–191. [Google Scholar]
  24. Yin, L.; Deng, Y. Toward uncertainty of weighted networks: An entropy-based model. Physica A 2018, 508, 176–186. [Google Scholar] [CrossRef]
  25. Xiao, Y.H.; Wu, W.T.; Wang, H.; Xiong, M.; Wang, W. Symmetry-based structure entropy of complex networks. Physica A 2008, 387, 2611–2619. [Google Scholar] [CrossRef] [Green Version]
  26. Nie, T.; Guo, Z.; Zhao, K.; Lu, Z.M. Using mapping entropy to identify node centrality in complex networks. Physica A 2016, 453, 290–297. [Google Scholar] [CrossRef]
  27. Tsallis, C. Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  28. Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Rak, R.; Rak, E. The Fractional Preferential Attachment scale-free network model. Entropy 2020, 22, 509. [Google Scholar] [CrossRef]
  30. Zhang, Q.; Li, M.; Du, Y.; Deng, Y. Local structure entropy of complex networks. arXiv 2014, arXiv:1412.3910. Available online: https://arxiv.org/abs/1412.3910 (accessed on 20 July 2020).
  31. Pastor-Satorras, R.; Vespignani, A. Epidemic dynamics and endemic states in complex networks. Phys. Rev. E 2001, 63, 066117. [Google Scholar] [CrossRef] [Green Version]
  32. Zhao, Y.; Luo, X.; Lin, X.; Wang, H.; Kui, X.; Zhou, F.; Wang, J.; Chen, Y.; Chen, W. Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans. Vis. Comput. Graph. 2019, 26, 590–600. [Google Scholar] [CrossRef]
  33. Knight, W.R. A computer method for calculating Kendall’s tau with ungrouped data. J. Am. Stat. Assoc. 1966, 61, 436–439. [Google Scholar] [CrossRef]
Figure 1. Node network diagram.
Figure 1. Node network diagram.
Entropy 22 00848 g001
Figure 2. Flow chart of the TsallisRank algorithm.
Figure 2. Flow chart of the TsallisRank algorithm.
Entropy 22 00848 g002
Figure 3. D and M curves of ranking methods in random synthetic scale-free networks: (a) D curve in BA networks; (b) M curve in BA networks; (c) D curve in FPA networks; (d) M curve in FPA networks.
Figure 3. D and M curves of ranking methods in random synthetic scale-free networks: (a) D curve in BA networks; (b) M curve in BA networks; (c) D curve in FPA networks; (d) M curve in FPA networks.
Entropy 22 00848 g003aEntropy 22 00848 g003b
Figure 4. Node rankings and frequency distributions of the ranking methods: (a) In Karate network; (b) In Dolphins network; (c) In Jazz network; (d) In Elegans network.
Figure 4. Node rankings and frequency distributions of the ranking methods: (a) In Karate network; (b) In Dolphins network; (c) In Jazz network; (d) In Elegans network.
Entropy 22 00848 g004aEntropy 22 00848 g004b
Figure 5. CCDF curves of the ranking methods: (a) In Karate network; (b) In Dolphins network; (c) In Jazz network; (d) In Elegans network.
Figure 5. CCDF curves of the ranking methods: (a) In Karate network; (b) In Dolphins network; (c) In Jazz network; (d) In Elegans network.
Entropy 22 00848 g005
Figure 6. Ranking lists and the curves of the Jaccard similarity coefficient of σ: (a) In Email network; (b) In Euroroad network; (c) In Yeast network; (d) In Hamsterster network.
Figure 6. Ranking lists and the curves of the Jaccard similarity coefficient of σ: (a) In Email network; (b) In Euroroad network; (c) In Yeast network; (d) In Hamsterster network.
Entropy 22 00848 g006
Figure 7. Relationships between SIR β and Kendall’s tau of the ranking lists: (a) In Dolphins network; (b) In Euroroad network; (c) In Elegans network; (d) In Yeast network.
Figure 7. Relationships between SIR β and Kendall’s tau of the ranking lists: (a) In Dolphins network; (b) In Euroroad network; (c) In Elegans network; (d) In Yeast network.
Entropy 22 00848 g007
Figure 8. Relationships between modified SIR β 0 and Kendall’s tau of the ranking lists: (a) In Dolphins network; (b) In Jazz network.
Figure 8. Relationships between modified SIR β 0 and Kendall’s tau of the ranking lists: (a) In Dolphins network; (b) In Jazz network.
Entropy 22 00848 g008
Figure 9. Time curves of different algorithms in real networks.
Figure 9. Time curves of different algorithms in real networks.
Entropy 22 00848 g009
Table 1. Some statistical data of random synthetic scale-free networks.
Table 1. Some statistical data of random synthetic scale-free networks.
Network|V||E|Average DegreeMaximum DegreeAssortativityClustering Coefficient
BAG_400_20040040,000200.0384−0.3981110.722654
BAG_600_30060090,000300.0577−0.3970540.721492
BAG_800_400800160,000400.0776−0.3976500.724718
BAG_1000_5001000250,000500.0967−0.3971260.722987
BAG_1200_6001200360,000600.01159−0.3974110.723613
BAG_1400_7001400490,000700.01347−0.3971430.723255
FPA_acyclic_f_1_BA_model100,006100,0051.999981340−0.0143830.0
FPA_acyclic_f_07100,006 100,0051.999981621−0.0289930.0
FPA_acyclic_f_05100,006100,0051.999984981−0.0477840.0
FPA_acyclic_f_02100,006100,0051.9999821,951−0.1578860.0
Table 2. Some statistical data of real networks.
Table 2. Some statistical data of real networks.
Network|V||E|Average DegreeMaximum DegreeAssortativityClustering Coefficient
Karate34784.58817−0.47560.5706
Dolphins621595.12912−0.0435940.2590
Jazz198274227.6971000.02020.6175
Elegans45320258.940237−0.22580.6465
Email113354519.622710.07820.2203
Euroroad117414172.414100.12670.0167
Yeast236171826.083966−0.08460.1301
Hamsterster242616,63113.7112730.04740.5376
PowerGrid494165942.6692730.00350.0801
PGP10,68024,3164.5542050.23820.2659
Table 3. D method evaluation performance analysis table.
Table 3. D method evaluation performance analysis table.
NetworkD(DC)D(Ks)D(LE)D(MDD)D(Cnc+)D(TRank)
Karate0.32350.14710.82350.44120.76470.7941
Dolphins0.19350.09680.91940.40320.88710.9677
Jazz0.31310.18690.96460.67680.96460.9697
Elegans0.08830.05740.83660.19870.86760.9029
Email0.04240.04770.89140.17030.91700.9762
Euroroad0.00770.00680.18060.01870.07070.9446
Yeast0.02370.02160.63570.09230.61920.7954
Hamsterster0.04580.05280.65870.16200.66860.7003
PowerGrid0.00320.00400.21170.01050.05650.9041
PGP0.00780.01240.37270.03290.29020.7456
Table 4. M method analysis table.
Table 4. M method analysis table.
NetworkM(DC)M(Ks)M(LE)M(MDD)M(Cnc+)M(TRank)
Karate0.70790.54990.95770.75360.94720.9542
Dolphins0.83120.55760.99050.90910.98950.9979
Jazz0.96590.89510.99930.99110.99930.9994
Elegans0.79220.73990.99720.87680.99800.9988
Email0.88740.85210.99900.92330.99970.9999
Euroroad0.44420.33120.91810.65100.94630.9990
Yeast0.74720.70520.99210.74770.99620.9972
Hamsterster0.89800.89070.98530.92740.98560.9858
PowerGrid0.59270.37130.96350.69400.95680.9999
PGP0.61930.50000.97810.66790.99390.9997
Table 5. Correlation coefficients of SIR and Kendall.
Table 5. Correlation coefficients of SIR and Kendall.
Network β β t h τ(σ,DC)τ(σ,Ks)τ(σ,LE)τ(σ,MDD)τ(σ,Cnc+)τ(σ, TRank)
Karate0.2500.1290.63100.54900.65420.65420.92690.8128
Dolphins0.1500.1470.78050.57960.76890.81700.84030.9418
Jazz0.0400.0260.83710.78470.84150.86630.94550.9726
Elegans0.0500.0250.66770.69310.56850.69020.86360.9199
Email0.0500.0540.78920.79620.76540.80730.94130.9578
Euroroad0.2750.3330.55720.45710.42490.67210.83370.9341
Yeast0.1000.0610.59080.61470.52410.64900.92220.9289
Hamsterster0.0200.0240.74470.73330.64160.75100.92340.9349
PowerGrid0.2000.2580.62440.45030.50550.66670.78870.9107
PGP0.1000.0530.36440.36510.20260.37450.78400.6913

Share and Cite

MDPI and ACS Style

Chen, X.; Zhou, J.; Liao, Z.; Liu, S.; Zhang, Y. A Novel Method to Rank Influential Nodes in Complex Networks Based on Tsallis Entropy. Entropy 2020, 22, 848. https://doi.org/10.3390/e22080848

AMA Style

Chen X, Zhou J, Liao Z, Liu S, Zhang Y. A Novel Method to Rank Influential Nodes in Complex Networks Based on Tsallis Entropy. Entropy. 2020; 22(8):848. https://doi.org/10.3390/e22080848

Chicago/Turabian Style

Chen, Xuegong, Jie Zhou, Zhifang Liao, Shengzong Liu, and Yan Zhang. 2020. "A Novel Method to Rank Influential Nodes in Complex Networks Based on Tsallis Entropy" Entropy 22, no. 8: 848. https://doi.org/10.3390/e22080848

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop