Similarity preserving overlapping community detection in signed networks
Introduction
In the field of complex networks, the networks containing both positive and negative links are called signed networks [1]. Positive links in signed networks denote the positive relationships, such as “friend”, “trust”, “like”, “support” and “cooperative” relationships. On the contrary, negative links denote the negative relationships, such as “enemy”, “distrust”, “dislike”, “oppose” and “hostile” relationships. Signed networks are ubiquitous in the real-world, such as social networks containing trust and distrust relationships, protein interaction networks containing activation and inhibition relationships and international relationship networks containing cooperation and hostility relationships. In addition to these naturally formed signed networks, we can also artificially construct signed networks from interactions among data objects by using specific algorithms. For example, In [2], Hassan et al. applied linguistic analysis techniques to identify attitudes (support or oppose) from online discussion texts and then built a debater signed network. In [3], Maniu et al. inferred a trust signed network by aggregating various user interactions on Wikipedia content. In [4], Hoang et al. computed the cosine similarities of all pairs of documents and then constructed a document signed network by treating the similarities as the weights of the corresponding links.
Due to the diversity, popularity and availability of signed networks, signed networks analysis and mining has drawn more and more attention and community detection (a.k.a. node clustering) is one of the important research topics. Community detection in signed networks is motivated by the famous structure balance theory in social science. Structure balance theory was originally proposed by Heider [5], and later was extended by Cartwright and Harary [6]. This theory states that a structurally balanced signed network is partitionable and can be separated into communities (clusters), such that links within communities are positive and links between communities are negative. Unfortunately, real-world signed networks are often imbalanced [7], so it is hard to satisfy such a strict criterion for community detection in signed networks. Therefore, in practice we often follow a relatively loose criterion. Namely, community detection in signed networks is to find communities so that most positive links lie in communities and most negative links lie between communities. This criterion makes community detection in real-world signed networks more operable.
Community detection is of great importance to understanding the structural and functional properties of signed networks. In addition, it also has high application value, e.g., recommending products to user groups in signed social networks [8], detecting protein complexes from signed protein–protein interaction networks [9] and finding alliances from international relationship networks [10]. Although methods for community detection in unsigned networks (or the networks with only positive links) have been extensively studied, but they cannot be simply applied in signed networks due to the existence of negative links. Actually, the definitions of community in unsigned and signed networks are substantially different. In unsigned networks, community is defined as a group of nodes which have dense links within groups and sparse links between groups, whereas for signed networks, communities are defined not only by the density of links but also by the signs of links. Therefore, detect communities in signed networks is more challenging.
Recently, community detection in signed networks has received increasing attention, and meanwhile some related methods have been proposed. Although all these methods have achieved improved performance in some cases, they still suffer from the following problems.
- •
Most of them detect communities by directly using the link topology of signed networks. However, this type of information (often represented as a binary adjacency matrix) is usually very sparse, noisy and even incomplete, and hence is insufficient to identify high-quality communities.
- •
They mainly aim to uncover non-overlapping communities where every node is assigned to only one community. However, overlapping communities where a node is allowed to belong to multiple communities are common in real-world signed networks. For example, in a debater signed network, one user can participate in more than one group, even if these groups have different opinions. Therefore, in order to learn the community patterns of signed networks adequately, it is necessary to develop the method for overlapping community detection.
Aiming at addressing the aforementioned problems, we propose a similarity preserving overlapping community detection (SPOCD) method for signed networks and our main work is summarized as follows:
- •
Based on the available link topology, we first devise an effective node similarity measure to extract node similarity information from the network, and then further obtain the geometric structure information by constructing the corresponding -nearest neighbors (-NN) graph. The fusion of these two sources of information is selected as a good alternative to the link topology information when used for community detection.
- •
To detect overlapping communities effectively, we specially design a graph regularized binary semi-nonnegative matrix factorization (GRBSNMF) model. This model not only can well incorporate node similarity information and geometric structure information of the network to improve the performance, but also can obtain the binary community indicator matrix, which is very convenient for identifying overlapping communities directly. Besides, an effective learning algorithm with guaranteed convergence is proposed to optimize the GRBSNMF model.
- •
To evaluate the performance of our method, we conduct extensive experiments on both synthetic and real-world signed networks. The results demonstrate the superiority of our method over state-of-the-art methods.
The rest of this paper is organized as follows. In Section 2, we firstly give a brief review of related work. Then the proposed method SPOCD is detailed in Section 3. In Section 4, we report extensive experimental results that demonstrate the effectiveness of our method. Finally, the conclusion and future work of this paper are given in Section 5.
Section snippets
Related work
In this section, we briefly review the related work regarding community detection in signed networks and NMF-based community detection, both of which are most relevant to the topic of this paper.
Methodology
In this section, we first introduce notations and the problem statement, and then describe how to extract node similarity information and geometric structure information from the link topology information. Finally, we present the SPOCD method in detail, including GRBSNMF model for overlapping community detection, solutions to the optimization problems and the overlapping community detection algorithm.
Experimental study
In this section, to validate the effectiveness of our proposed method SPOCD, we conduct extensive experiments on several synthetic and real-world signed networks. SPOCD is implemented by MATLAB 2014a and all of experiments are conducted on a PC with 64 bits Windows 7 system, 3.4G Intel Core i7-6700 CPU and 32 GB RAM.
Conclusions
Aiming at the problem of overlapping community detection in signed networks, in this paper we propose a method SPOCD based on graph regularized binary semi-nonnegative matrix factorization (GRBSNMF). SPOCD flexibly provides a unified framework to integrate node similarity information and geometric structure information, which are both extracted from the link topology information of the network. In this way, SPOCD can take full advantage of the link topology information to improve the
CRediT authorship contribution statement
Chaobo He: Writing - original draft, Conceptualization, Methodology. Hai Liu: Writing - review & editing. Yong Tang: Supervision. Shuangyin Liu: Resources, Investigation. Xiang Fei: Writing - review & editing. Qiwei Cheng: Software, Visualization. Hanchao Li: Formal analysis.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 62077045, Grant U1811263 and Grant 61772211, in part by the Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant19YJCZH049, in part by the Natural Science Foundation of Guangdong Province of China under Grant 2019A1515011292, in part by the Science and Technology Support Program of Guangdong Province of China under Grant 2017A040405057, and in part by the Science
Chaobo He received his Ph.D., M.S. and B.S. degree from South China Normal University, China in 2014, 2007 and 2004, respectively. He is currently a professor in Zhongkai University of Agriculture and Engineering, China, and is also a visiting scholar in School of Data and Computer Science, Sun Yat-sen University, China. His research interests are machine learning and social computing. He has published over 20 papers on international journals and conferences.
References (59)
- et al.
A survey of signed network mining in social media
ACM Comput. Surv.
(2016) - et al.
Evolving the attribute flow for dynamical clustering in signed networks
Chaos Solitons Fractals
(2018) - et al.
Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data
Methods
(2016) - et al.
Combination of links and node contents for community discovery using a graph regularization approach
Future Gener. Comput. Syst.
(2019) - et al.
Autonomous semantic community detection via adaptively weighted low-rank approximation
ACM Trans. Multimed. Comput. Commun. Appl.
(2019) - et al.
Nonnegative matrix factorization with mixed hypergraph regularization for community detection
Inform. Sci.
(2018) - et al.
Community detection based on regularized semi-nonnegative matrix tri-factorization in signed networks
Mob. Netw. Appl.
(2018) - et al.
Detecting drug communities and predicting comprehensive drug-drug interactions via balance regularized semi-nonnegative matrix factorization
J. Cheminformatics
(2019) - A. Hassan, A. Abu-Jbara, D. Radev, Extracting signed social networks from text, in: Workshop of TextGraphs-7:...
- S. Maniu, B. Cautis, T. Abdessalem, Building a signed network from interactions in Wikipedia, in: ACM International...
Attitudes and cognitive organization
J. Psychol.
Structure balance: a generalization of Heider’s theory
Psychol. Rev.
Detecting protein complexes from signed protein-protein interaction networks
IEEE/ACM Trans. Comput. Biol. Bioinform.
Finding and evaluating community structure in networks
Phys. Rev. E
Analysis of community structure in networks of correlated data
Phys. Rev. E
A comparative analysis of evolutionary and memetic algorithms for community detection from signed social networks
Soft Comput.
Stochastic blockmodeling and variational Bayes learning for signed network analysis
IEEE Trans. Knowl. Data Eng.
Stochastic block model and exploratory analysis in signed networks
Phys. Rev. E
Community detection in signed networks based on the signed stochastic block model and exact ICL
IEEE Access
Overlapping community detection in networks with positive and negative links
J. Stat. Mech. Theory Exp.
Community mining from signed social networks
IEEE Trans. Knowl. Data Eng.
Cited by (10)
Exploiting optimised communities in directed weighted graphs for link prediction
2022, Online Social Networks and MediaCitation Excerpt :Some challenges are usually faced during an analysis of signed networks, i.e., (i) data observed with binary value, (ii) positive links are dominant over negative one in terms of availability, (iii) network has a huge volume of data and (iv) information regarding missing links is not explicitly present. With all these challenges, community detection, i.e., clustering [20], edge sign prediction [9,21], node classification and ranking [22,23], sub-graph finding, i.e., cliques [24], link prediction [25,26] and negative link prediction [5] are the most studied tasks in signed network, despite the fact that network is highly imbalanced (negative links are the only 20%). To predict class of unlabelled node and link in signed network methods were proposed in [27–29].
Community Detection Based on Directed Weighted Signed Graph Convolutional Networks
2024, IEEE Transactions on Network Science and EngineeringCryptocurrency Transaction Network Embedding From Static and Dynamic Perspectives: An Overview
2023, IEEE/CAA Journal of Automatica SinicaDiagnosis of Overlapping Communities and Coherent Groups Using Structural Centrality based Methodology
2023, Journal of Intelligent Systems and Internet of ThingsBoosting Nonnegative Matrix Factorization Based Community Detection With Graph Attention Auto-Encoder
2022, IEEE Transactions on Big DataA Survey of Community Detection in Complex Networks Using Nonnegative Matrix Factorization
2022, IEEE Transactions on Computational Social Systems
Chaobo He received his Ph.D., M.S. and B.S. degree from South China Normal University, China in 2014, 2007 and 2004, respectively. He is currently a professor in Zhongkai University of Agriculture and Engineering, China, and is also a visiting scholar in School of Data and Computer Science, Sun Yat-sen University, China. His research interests are machine learning and social computing. He has published over 20 papers on international journals and conferences.
Hai Liu received the Ph.D. degree from the School of Data and Computer Science, Sun Yat-sen University, China, in 2010. He is currently an associate professor with the School of Computer Science, South China Normal University. His current research interests include machine learning, data mining and big data.
Yong Tang got his BS and M.Sc. degrees from Wuhan University in 1985 and 1990 respectively, and Ph.D. degree from University of Science and Technology of China in 2001, all in computer science. He is now a Professor and Dean of the School of Computer Science at South China Normal University (SCNU). He serves as the director of services computing engineering research center of Guangdong province. He was vice dean of School of Information Science and Technology at Sun Yat-sen University, before he joined SCNU in 2009. He has published more than 200 papers and books. As a supervisor he has had more than 40 Ph.D. students and Post Doc researchers since 2003 and more than 100 Master students since 1996. He is a Distinguished Member and the director of Technical Committee on Collaborative Computing of China Computer Federation. He has also served as general or program committee co-chair of more than 10 conferences.
Shuangyin Liu received the Ph.D. degree from the College of Information and Electrical Engineering, China Agricultural University, in 2014. He is currently a professor in the School of Information Science and Technology, Zhongkai University of Agriculture and Engineering, China. His current research interests are intelligent information system of agriculture, artificial intelligence, software engineering and computational intelligence.
Xiang Fei received a B.Sc. and a Ph.D. from Southeast University China in 1992 and 1999 respectively. After graduation, He worked, as a post-doctoral research fellow, on a number of projects including European IST Programs and EPSRC. He is currently working as a senior lecturer for the School of Computing, Electronics and Maths at Coventry University. His current research interests include machine learning and data mining in cyber–physical systems.
Qiwei Cheng received his B.Sc. degree from School of computing science, Zhongkai University of Agriculture and Engineering, China, in 2019. He is currently a M.Sc. student in South China Normal University, China. His research interests are machine learning and social computing.
Hanchao Li received his B.Sc. degree in Mathematics from University of Warwick in 2013, and M.Sc. degree in Computing from Coventry University in 2015. He is currently a Ph.D. research student in Coventry University and worked on music information retrieval, i.e., data mining in music subject area. His research interests are big data, data mining, machine learning and any mathematics-related researches. He has published several conference and journal papers.