Community detection via an efficient nonconvex optimization approach based on modularity

https://doi.org/10.1016/j.csda.2020.107163Get rights and content

Abstract

Maximizing modularity is a widely used method for community detection, which is generally solved by approximate or greedy search because of its high complexity. In this paper, we propose a method, named MSM, for modularity maximization, which reformulates the modularity maximization problem as a subset identification problem and maximizes the surrogate of the modularity. The surrogate of the modularity is constructed by replacing the discontinuous indicator functions in the reformulated modularity function with the continuous truncated L1 function. This makes the NP-hard problem of maximizing the modularity function approximately become a non-convex optimization problem, which can be efficiently solved via the DC (Difference of Convex Functions) Programming. The proposed MSM method can be used for community detection when the number of communities is given, and it can also be applied to the situation where the number of communities is unknown. Then, we demonstrate the advantages of the proposed MSM method by some simulation results and real data analyses.

Introduction

In recent years, network analysis is a very popular research direction in many fields. In the literature, there are many kinds of networks, including technological networks (Watts and Strogatz, 1998, Gastner and Newman, 2004), social networks (Travers and Milgram, 1969, Guimera et al., 2003), information networks (Flake et al., 2002), biological networks (Négyessy et al., 2006, Barabasi et al., 2011), etc. In many cases, the network units can be divided into groups with the property that there are many edges between the units in the same group, but relatively few edges between the units in the different groups. Such type of groups are viewed as communities, which are often associated with important structural characteristics of a complex system.

To recover the network communities, there have been a larger number of algorithms in the literature, including the greedy algorithms, such as the GN algorithm (Newman and Girvan, 2004) and the Lernighan–Lin algorithm (Kernighan and Lin, 1970), the algorithms based on optimizing some reasonable criteria over all possible partitions of networks, such as the spectral clustering methods (McSherry, 2001, Lei and Rinaldo, 2015) and the modularity optimization methods (Clauset et al., 2004, Newman, 2006a, Chen et al., 2018), the algorithms based on probability models, such as the stochastic block models (SBMs) (Holland et al., 1983, Zhang et al., 2017), the degree-corrected stochastic block models (DCSBMs) (Karrer and Newman, 2011, Chen et al., 2018) and the latent space models (Hoff et al., 2002). In addition, some methods are designed to deal with the community detection problem with overlap (Ball et al., 2011, Amini and Levina, 2018, Jin et al., 2019, Mao et al., 2020), that is, the nodes in the network may belong to more than one community.

In this paper, we mainly focus on the modularity optimization algorithms for network community detection, which are widely used due to their practicality and efficiency (Reichardt and Bornholdt, 2007, Chen et al., 2014). Modularity is considered to be one of the most important community detection criteria, which has the unique privilege of being at the same time a global criterion to define a community, a quality function and the key ingredient of the most popular method of graph clustering (Fortunato, 2010). Under the modularity framework, maximizing the modularity function is the key problem, which is actually a NP-hard problem (Newman, 2006b). The earliest algorithm for maximizing the modularity function is the GN algorithm (Newman and Girvan, 2004), which is a greedy algorithm. To reduce some useless operations of the GN algorithm in situation of sparse networks, Clauset et al. (2004) proposed the CNM algorithm. These two algorithms are based on hierarchical search, while some follow-up algorithms are established based on spectral optimization. For example, Newman (2006a) rewrote the expression of modularity as the eigenspectrum of the modularity matrix, and then proposed the EIGN algorithm based on this expression. In addition, there are some methods to optimize modularity based on block model (Chen et al., 2018).

These algorithms are approximate optimization of modularity, which try to find a proper balance between community detection accuracy and computational efficiency. Besides, there are many useful strategies for approximate optimization, some of which attempted to relax the binary membership assignment to a continuous version to ease the optimization (Amini and Levina, 2018, Liu et al., 2017). In particular, Liu et al. (2017) reconstructed the objection function of a subset selection problem with some indicator functions, and then approximated the indicator functions with the truncated L1 function proposed by Shen et al. (2012). By drawing on the idea of Shen et al. (2012) and Liu et al. (2017), we reformulate the community detection problem as a subset identification problem, which is solved by maximizing the surrogate of modularity. Then, the proposed method is named as Maximizing the Surrogate of Modularity, which is written as MSM for short. Specifically, the surrogate of modularity is constructed by replacing the discontinuous indicator functions in the reformulated modularity function with the continuous truncated L1 function as well as adding some regularization items like in Liu et al. (2017). As a result, the NP-hard problem of maximizing the modularity function approximately becomes a non-convex optimization problem, which can be efficiently solved via DC Programming (Le Thi and Tao, 2005). Then, we demonstrate the advantages of the proposed MSM method by some simulation results and real data analyses.

The rest of this paper is organized as follows. We elaborate the definition of modularity and the proposed algorithm in Section 2. Then, we present the simulation results of the proposed algorithm and some related algorithms in Section 3, followed by some real data analyses in Section 4. Finally, we conclude this paper in Section 5.

Section snippets

Modularity

First, we introduce some notation. Let G=(V,E) denote a network with the node set V={1,,n} and the edge set EV×V, which can be formulated by the adjacency matrix A[Aij][0,+)n×n, where Aij>0 if (i,j)E, otherwise Aij=0. Suppose there is no self-loop in network G, i.e. Aii=0 for each node iV. Let μi=1nj=1nAij. For each iV, let dioutj=1nAij denote the out-degree and diinj=1nAji denote the in-degree. If G is undirected, then diin=diout, as Aij=Aji for each i,jV.

Suppose that G has K

Simulation study

In this section, we present some simulation results to demonstrate the performance of the proposed MSM method, comparing with some classical community detection algorithms based on modularity, GN, CNM and EIGN, proposed in Newman and Girvan (2004), Clauset et al. (2004) and Newman (2006a) respectively, two relaxation algorithms based on block models, i.e. SDP_1 proposed in Amini and Levina (2018) and CMM proposed in Chen et al. (2018), and two overlapping community detection methods proposed by 

Real data analyses

In this section, we investigate the performance of the proposed method as well as its competitors, via seven commonly used real world networks. A brief introduction of these networks is as follows. The network named Zachary’s Karate Club (Zachary, 1977) consists of 34 members and 78 edges, where the members were divided into two groups after a quarrel. The Visuotactile brain areas and connections (Négyessy et al., 2006) is a network describing the connections in the visual activity areas of the

Conclusion

In this paper, the MSM method has been established for finding a proper balance between community detection accuracy and computational efficiency, which is implemented by maximizing a surrogate of modularity. On this ground, the NP-hard combinatorial problem of maximizing modularity is approximately transformed into a nonconvex optimization problem, which can be solved by the DC Programming. The convergence of the MSM method is provided, and its good performance is presented by some simulation

Acknowledgements

This work was supported by NSFC grants 11571068, 11631003 and 11690012, the Special Fund for Key Laboratories of Jilin Province, China grant 20190201285JC, the project of teaching reform of higher education of Jilin Province, China grant JLL0824320190726182454.

References (35)

  • FortunatoS.

    Community detection in graphs

    Phys. Rep.

    (2010)
  • HollandP.W. et al.

    Stochastic blockmodels: First steps

    Social Networks

    (1983)
  • AminiA.A. et al.

    Pseudo-likelihood methods for community detection in large sparse networks

    Ann. Statist.

    (2013)
  • AminiA.A. et al.

    On semidefinite relaxations for the block model

    Ann. Statist.

    (2018)
  • BallB. et al.

    Efficient and principled method for detecting communities in networks

    Phys. Rev. E

    (2011)
  • BarabasiA. et al.

    Network medicine: a network-based approach to human disease

    Nature Rev. Genet.

    (2011)
  • ChenM. et al.

    Community detection via maximization of modularity and its variants

    Comput. Soc. Syst. IEEE Trans.

    (2014)
  • ChenY. et al.

    Convexified modularity maximization for degree-corrected stochastic block models

    Ann. Statist.

    (2018)
  • ClausetA. et al.

    Finding community structure in very large networks

    Phys. Rev. E

    (2004)
  • FlakeG. et al.

    Self-organization and identification of Web communities

    Computer

    (2002)
  • GastnerM.T. et al.

    Diffusion-based method for producing density-equalizing maps

    Proc. Natl. Acad. Sci.

    (2004)
  • GleiserP.M. et al.

    Community structure in jazz

    Adv. Complex Syst.

    (2003)
  • GuimeraR. et al.

    Self-similar community structure in a network of human interactions

    Phys. Rev. E

    (2003)
  • HoffP.D. et al.

    Latent space approaches to social network analysis

    J. Amer. Statist. Assoc.

    (2002)
  • JeongH. et al.

    The large-scale organization of metabolic networks

    Nature

    (2000)
  • JinJ. et al.

    Estimating network memberships by simplex vertex hunting

    (2019)
  • KarrerB. et al.

    Stochastic blockmodels and community structure in networks

    Phys. Rev. E

    (2011)
  • Cited by (0)

    View full text