Similarity preserving overlapping community detection in signed networks

https://doi.org/10.1016/j.future.2020.10.034Get rights and content

Highlights

  • A novel overlapping community method called SPOCD for signed networks is proposed.

  • SPOCD is based on graph regularized binary semi-nonnegative matrix factorization.

  • SPOCD can well incorporate node similarity and geometric structure information.

  • SPOCD can directly identify overlapping communities and nodes.

  • Extensive experiments demonstrate the superiority of SPOCD.

Abstract

Community detection in signed networks is a challenging research problem, and is of great importance to understanding the structural and functional properties of signed networks. It aims at dividing nodes into different clusters with more intra-cluster and less inter-cluster links. Meanwhile, most positive links should lie within clusters and most negative links should lie between clusters. In recent years, some methods for community detection in signed networks have been proposed, but few of them focus on overlapping community detection. Moreover, most of them directly exploit the sparse link topology to detect communities, which often makes them perform poorly. In view of this, in this paper we propose a similarity preserving overlapping community detection (SPOCD) method. SPOCD firstly extracts node similarity information and geometric structure information from the link topology, and then uses a graph regularized binary semi-nonnegative matrix factorization (GRBSNMF) model to fuse these two sources of information to detect communities. Through this mechanism, nodes with high similarity can be well preserved in the same community. Besides, SPOCD devises a special discretization strategy to obtain the binary community indicator matrix, which is very convenient for directly identifying overlapping communities in signed networks. We conduct extensive experiments on synthetic and real-world signed networks, and the results demonstrate that our method outperforms state-of-the-art methods.

Introduction

In the field of complex networks, the networks containing both positive and negative links are called signed networks [1]. Positive links in signed networks denote the positive relationships, such as “friend”, “trust”, “like”, “support” and “cooperative” relationships. On the contrary, negative links denote the negative relationships, such as “enemy”, “distrust”, “dislike”, “oppose” and “hostile” relationships. Signed networks are ubiquitous in the real-world, such as social networks containing trust and distrust relationships, protein interaction networks containing activation and inhibition relationships and international relationship networks containing cooperation and hostility relationships. In addition to these naturally formed signed networks, we can also artificially construct signed networks from interactions among data objects by using specific algorithms. For example, In [2], Hassan et al. applied linguistic analysis techniques to identify attitudes (support or oppose) from online discussion texts and then built a debater signed network. In [3], Maniu et al. inferred a trust signed network by aggregating various user interactions on Wikipedia content. In [4], Hoang et al. computed the cosine similarities of all pairs of documents and then constructed a document signed network by treating the similarities as the weights of the corresponding links.

Due to the diversity, popularity and availability of signed networks, signed networks analysis and mining has drawn more and more attention and community detection (a.k.a. node clustering) is one of the important research topics. Community detection in signed networks is motivated by the famous structure balance theory in social science. Structure balance theory was originally proposed by Heider [5], and later was extended by Cartwright and Harary [6]. This theory states that a structurally balanced signed network is partitionable and can be separated into k communities (clusters), such that links within communities are positive and links between communities are negative. Unfortunately, real-world signed networks are often imbalanced [7], so it is hard to satisfy such a strict criterion for community detection in signed networks. Therefore, in practice we often follow a relatively loose criterion. Namely, community detection in signed networks is to find k communities so that most positive links lie in communities and most negative links lie between communities. This criterion makes community detection in real-world signed networks more operable.

Community detection is of great importance to understanding the structural and functional properties of signed networks. In addition, it also has high application value, e.g., recommending products to user groups in signed social networks [8], detecting protein complexes from signed protein–protein interaction networks [9] and finding alliances from international relationship networks [10]. Although methods for community detection in unsigned networks (or the networks with only positive links) have been extensively studied, but they cannot be simply applied in signed networks due to the existence of negative links. Actually, the definitions of community in unsigned and signed networks are substantially different. In unsigned networks, community is defined as a group of nodes which have dense links within groups and sparse links between groups, whereas for signed networks, communities are defined not only by the density of links but also by the signs of links. Therefore, detect communities in signed networks is more challenging.

Recently, community detection in signed networks has received increasing attention, and meanwhile some related methods have been proposed. Although all these methods have achieved improved performance in some cases, they still suffer from the following problems.

  • Most of them detect communities by directly using the link topology of signed networks. However, this type of information (often represented as a binary adjacency matrix) is usually very sparse, noisy and even incomplete, and hence is insufficient to identify high-quality communities.

  • They mainly aim to uncover non-overlapping communities where every node is assigned to only one community. However, overlapping communities where a node is allowed to belong to multiple communities are common in real-world signed networks. For example, in a debater signed network, one user can participate in more than one group, even if these groups have different opinions. Therefore, in order to learn the community patterns of signed networks adequately, it is necessary to develop the method for overlapping community detection.

Aiming at addressing the aforementioned problems, we propose a similarity preserving overlapping community detection (SPOCD) method for signed networks and our main work is summarized as follows:

  • Based on the available link topology, we first devise an effective node similarity measure to extract node similarity information from the network, and then further obtain the geometric structure information by constructing the corresponding p-nearest neighbors (p-NN) graph. The fusion of these two sources of information is selected as a good alternative to the link topology information when used for community detection.

  • To detect overlapping communities effectively, we specially design a graph regularized binary semi-nonnegative matrix factorization (GRBSNMF) model. This model not only can well incorporate node similarity information and geometric structure information of the network to improve the performance, but also can obtain the binary community indicator matrix, which is very convenient for identifying overlapping communities directly. Besides, an effective learning algorithm with guaranteed convergence is proposed to optimize the GRBSNMF model.

  • To evaluate the performance of our method, we conduct extensive experiments on both synthetic and real-world signed networks. The results demonstrate the superiority of our method over state-of-the-art methods.

The rest of this paper is organized as follows. In Section 2, we firstly give a brief review of related work. Then the proposed method SPOCD is detailed in Section 3. In Section 4, we report extensive experimental results that demonstrate the effectiveness of our method. Finally, the conclusion and future work of this paper are given in Section 5.

Section snippets

Related work

In this section, we briefly review the related work regarding community detection in signed networks and NMF-based community detection, both of which are most relevant to the topic of this paper.

Methodology

In this section, we first introduce notations and the problem statement, and then describe how to extract node similarity information and geometric structure information from the link topology information. Finally, we present the SPOCD method in detail, including GRBSNMF model for overlapping community detection, solutions to the optimization problems and the overlapping community detection algorithm.

Experimental study

In this section, to validate the effectiveness of our proposed method SPOCD, we conduct extensive experiments on several synthetic and real-world signed networks. SPOCD is implemented by MATLAB 2014a and all of experiments are conducted on a PC with 64 bits Windows 7 system, 3.4G Intel Core i7-6700 CPU and 32 GB RAM.

Conclusions

Aiming at the problem of overlapping community detection in signed networks, in this paper we propose a method SPOCD based on graph regularized binary semi-nonnegative matrix factorization (GRBSNMF). SPOCD flexibly provides a unified framework to integrate node similarity information and geometric structure information, which are both extracted from the link topology information of the network. In this way, SPOCD can take full advantage of the link topology information to improve the

CRediT authorship contribution statement

Chaobo He: Writing - original draft, Conceptualization, Methodology. Hai Liu: Writing - review & editing. Yong Tang: Supervision. Shuangyin Liu: Resources, Investigation. Xiang Fei: Writing - review & editing. Qiwei Cheng: Software, Visualization. Hanchao Li: Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 62077045, Grant U1811263 and Grant 61772211, in part by the Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant19YJCZH049, in part by the Natural Science Foundation of Guangdong Province of China under Grant 2019A1515011292, in part by the Science and Technology Support Program of Guangdong Province of China under Grant 2017A040405057, and in part by the Science

Chaobo He received his Ph.D., M.S. and B.S. degree from South China Normal University, China in 2014, 2007 and 2004, respectively. He is currently a professor in Zhongkai University of Agriculture and Engineering, China, and is also a visiting scholar in School of Data and Computer Science, Sun Yat-sen University, China. His research interests are machine learning and social computing. He has published over 20 papers on international journals and conferences.

References (59)

  • T.A. Hoang, E.P. Lim, Highly efficient mining of overlapping clusters in signed weighted networks, in: ACM...
  • HeiderF.

    Attitudes and cognitive organization

    J. Psychol.

    (1946)
  • CartwrightD. et al.

    Structure balance: a generalization of Heider’s theory

    Psychol. Rev.

    (1956)
  • J. Leskovec, D.P. Huttenlocher, J.M. Kleinberg, Signed networks in social media, in: ACM SIGCHI Conference on Human...
  • J.L. Tang, C. Aggarwal, H. Liu, Recommendations in signed social networks, in: ACM International Conference on World...
  • LeO.Y. et al.

    Detecting protein complexes from signed protein-protein interaction networks

    IEEE/ACM Trans. Comput. Biol. Bioinform.

    (2015)
  • L.Y. Chu, Z.F. Wang, J. Pei, J.N. Wang, Z.J. Zhao, E.H. Chen, Finding gangs in war from signed networks, in: ACM SIGKDD...
  • NewmanM.E.J. et al.

    Finding and evaluating community structure in networks

    Phys. Rev. E

    (2004)
  • GómezS. et al.

    Analysis of community structure in networks of correlated data

    Phys. Rev. E

    (2009)
  • LiY.D. et al.

    A comparative analysis of evolutionary and memetic algorithms for community detection from signed social networks

    Soft Comput.

    (2014)
  • P. Anchuri, M. Magdon-Ismail, Communities and balance in signed networks: a spectral approach, in: IEEE/ACM...
  • A. Amelio, C. Pizzuti, Community mining in signed networks: a multiobjective approach, in: IEEE/ACM International...
  • J. Kunegis, S. Schmidt, A. Lommatzsch, J. Lerner, E.W. De, L.S. Albayrak, Spectral analysis of signed graphs for...
  • K.Y. Chiang, J.J.Y. Whang, I.S. Dhillon, Scalable clustering of signed networks using balance normalized cut, in: ACM...
  • YangB. et al.

    Stochastic blockmodeling and variational Bayes learning for signed network analysis

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • JiangJ.Q.

    Stochastic block model and exploratory analysis in signed networks

    Phys. Rev. E

    (2015)
  • PingS.Q. et al.

    Community detection in signed networks based on the signed stochastic block model and exact ICL

    IEEE Access

    (2019)
  • ChenY. et al.

    Overlapping community detection in networks with positive and negative links

    J. Stat. Mech. Theory Exp.

    (2014)
  • YangB. et al.

    Community mining from signed social networks

    IEEE Trans. Knowl. Data Eng.

    (2007)
  • Cited by (10)

    • Exploiting optimised communities in directed weighted graphs for link prediction

      2022, Online Social Networks and Media
      Citation Excerpt :

      Some challenges are usually faced during an analysis of signed networks, i.e., (i) data observed with binary value, (ii) positive links are dominant over negative one in terms of availability, (iii) network has a huge volume of data and (iv) information regarding missing links is not explicitly present. With all these challenges, community detection, i.e., clustering [20], edge sign prediction [9,21], node classification and ranking [22,23], sub-graph finding, i.e., cliques [24], link prediction [25,26] and negative link prediction [5] are the most studied tasks in signed network, despite the fact that network is highly imbalanced (negative links are the only 20%). To predict class of unlabelled node and link in signed network methods were proposed in [27–29].

    • Community Detection Based on Directed Weighted Signed Graph Convolutional Networks

      2024, IEEE Transactions on Network Science and Engineering
    View all citing articles on Scopus

    Chaobo He received his Ph.D., M.S. and B.S. degree from South China Normal University, China in 2014, 2007 and 2004, respectively. He is currently a professor in Zhongkai University of Agriculture and Engineering, China, and is also a visiting scholar in School of Data and Computer Science, Sun Yat-sen University, China. His research interests are machine learning and social computing. He has published over 20 papers on international journals and conferences.

    Hai Liu received the Ph.D. degree from the School of Data and Computer Science, Sun Yat-sen University, China, in 2010. He is currently an associate professor with the School of Computer Science, South China Normal University. His current research interests include machine learning, data mining and big data.

    Yong Tang got his BS and M.Sc. degrees from Wuhan University in 1985 and 1990 respectively, and Ph.D. degree from University of Science and Technology of China in 2001, all in computer science. He is now a Professor and Dean of the School of Computer Science at South China Normal University (SCNU). He serves as the director of services computing engineering research center of Guangdong province. He was vice dean of School of Information Science and Technology at Sun Yat-sen University, before he joined SCNU in 2009. He has published more than 200 papers and books. As a supervisor he has had more than 40 Ph.D. students and Post Doc researchers since 2003 and more than 100 Master students since 1996. He is a Distinguished Member and the director of Technical Committee on Collaborative Computing of China Computer Federation. He has also served as general or program committee co-chair of more than 10 conferences.

    Shuangyin Liu received the Ph.D. degree from the College of Information and Electrical Engineering, China Agricultural University, in 2014. He is currently a professor in the School of Information Science and Technology, Zhongkai University of Agriculture and Engineering, China. His current research interests are intelligent information system of agriculture, artificial intelligence, software engineering and computational intelligence.

    Xiang Fei received a B.Sc. and a Ph.D. from Southeast University China in 1992 and 1999 respectively. After graduation, He worked, as a post-doctoral research fellow, on a number of projects including European IST Programs and EPSRC. He is currently working as a senior lecturer for the School of Computing, Electronics and Maths at Coventry University. His current research interests include machine learning and data mining in cyber–physical systems.

    Qiwei Cheng received his B.Sc. degree from School of computing science, Zhongkai University of Agriculture and Engineering, China, in 2019. He is currently a M.Sc. student in South China Normal University, China. His research interests are machine learning and social computing.

    Hanchao Li received his B.Sc. degree in Mathematics from University of Warwick in 2013, and M.Sc. degree in Computing from Coventry University in 2015. He is currently a Ph.D. research student in Coventry University and worked on music information retrieval, i.e., data mining in music subject area. His research interests are big data, data mining, machine learning and any mathematics-related researches. He has published several conference and journal papers.

    View full text