1 Introduction

Network theory is an important tool for describing and analyzing complex systems throughout a variety of disciplines. Community structures, defined as groups of nodes that are more densely connected than with rest of the network, are widely existed in many real-world complex systems, such as sociology, biology, transportation systems, and so on (Newman 2018). Discovering communities in these systems has become a primary approach to understand how network structure relates to system behaviors. As an effective technique to unveil the underlying structures, community detection has been utilized in many scenarios, such as finding potential friends in social media (Zhu et al. 2017), recommending products for users (Li and Zhang 2020), analyzing social opinions (Wang et al. 2017), and so on.

With the deepening of research, more and more scholars come to realize that simply uncovering communities in a single network is insufficient to analyze the structures and system behaviors in real-life applications. Unlike the community structure in single-layer networks, communities in multilayer networks are comprised of a group of well-connected nodes in all layers. For example, individuals in social networks may have various interactions (e.g. sending emails, participating in the same activity) among them (Ansari et al. 2011). As a result, the conventional studies are encountering with an essential problem of how to utilize the multiple views of the network (Papalexakis et al. 2013). There are also similar scenarios with relevant notations such as multiplex networks (Verbrugge 1979), multilevel networks (Wang et al. 2013), network of networks (Gao et al. 2011), interdependent networks (Buldyrev et al. 2010), multi-dimensional networks (Berlingerio et al. 2011c), which can be generally regarded as multilayer networks (Kivelä et al. 2014). As more interaction information implied, community detection in multilayer networks has been introduced to leverage various relationships to get more accurate results (Liu et al. 2018).

1.1 Background

The research of complex networks oriented from graph theory, which is started from the “Seven Bridges of Königsberg” problem in 1736. Although naive in many respects, this approach has been extremely successful in many real-life applications. At the end of the 20th century, by employing the graph model, the famous small-world (Watts and Strogatz 1998) and scale-free (Barabási and Albert 1999) features were discovered (Newman 2018). In 2002, Girvan and Newman uncovered the community structure in networks (Girvan and Newman 2002), which opened up a continuous upsurge of relevant research. The so-called community structures are groups of nodes that are more strongly or frequently connected among themselves than with the others. Therefore, community detection is proposed to find the most reasonable partitions of a network via observed topological structures. Several conventional approaches are listed in Table 1.

Table 1 Comparison of some classic community detection algorithms

In recent years, a sea of improved algorithms based on the above approaches are proposed with fruitful results. However, with the exponential growth of data scale, community detection on large volume data has encountered a serious of problems. For example, some datasets are changing with time, which brings challenges for traditional research on a static graph. In other words, traditional methods are incapable of dealing with large-scale time-varying networks.

Lancichinetti et al. (2009) raise two problems: The first one is about hierarchical structure, which means large communities are composed of small communities and in turn, large communities group together to form even larger ones. The second is overlapping communities, for example, people belong to more than one community, depending on their families, friends, professions, hobbies, etc. Nodes belong to more than one community is a pervasive scenario in networks.

Besides, the interactions among nodes are becoming more complicated than ever before. The conventional monolayer network (i.e. single-layer network) has provided plentiful cases in which a unit in a complex system is charted into a node, and the interactions between units are straightly represented as edges, no matter what type or weight of the interactions are. With the development of network modeling, we are starting to realize that the existing models cannot fully capture the details existed in some real-life problems, which even leads to incorrect descriptions of some phenomena taking place on real-world networks. Some representative problems occurred are listed as follows.

  1. (1)

    Multiple interactions among social networks. There has been an increasing focus on social media such as Twitter, Facebook, and Google+, etc. People share their opinions on daily affairs, chat with friends, or even make trades on these platforms. The main problem for analyzing social networks is the multiple interactions among individuals. For example, relationships between two individuals may include friendship, kinship, or schoolmates. If we regard all the relationships as undistinguished edges, the differences will be ignored, which is probably leading to incorrect results.

  2. (2)

    Interbank trades. Online payments are replacing traditional cash payments and credit cards, which offers a convenient lifestyle, meanwhile, providing a new way for financial criminals. For example, in money laundering activities (Colladon and Remondi 2017), criminals use different channels to conduct trades. If we simply analyze the transfer records from a single bank with a graph model, the result may be unconvinced. Thus, it is necessary to collect data from all possible trade channels with a multilayer network model, in which each channel can be regarded as a layer.

  3. (3)

    Urban transportation system. The study on the urban transportation system has caused wide public concern in the last decade (Black 2018; Sultana et al. 2019). Citizens travel in a variety of ways such as bus, subway, tram, and so on. In analyzing the public transportation system, the characteristics of various modes of transportation should be fully considered, especially for some hub stations (Zhang et al. 2018b) that should be given more attention to solving traffic congestion problems. When bus routes are blocked, numerous passengers select subway alternatively, which results in crowd subway operations. Inherently, the urban transportation system is a multilayer network model.

By tackling the above problems with the multilayer network model (Kivelä et al. 2014; Boccaletti et al. 2014), we are able to get a more reasonable result, i.e., the community structures in multilayer networks benefit to identify functionally cohesive sub-units and reveal complex interactions and heterogeneous links.

1.2 Main contributions

There have been numerous attempts to address community detection problem in multilayer networks through diverse approaches, e.g., identifying communities in temporal networks by modularity-maximization (Bazzi et al. 2016), where the authors emphasize the difference between “null networks” and “null models” in modularity maximization and discuss the effect of interlayer edges on the multilayer modularity-maximization problem. De Bacco et al. (2017) propose a generative model for multilayer networks, which can be used to aggregate layers into clusters or to compress a dataset by identifying especially relevant or redundant layers. The proposed model is capable of incorporating community detection and link prediction for multilayer networks, and experimental results on both synthetic and real-world datasets shows its feasibility. Analyzing multilayer networks is of great importance because many interesting patterns cannot be obtained by analyzing single-layer networks. That’s our motivation for summarizing these approaches. The contributions of this work are:

  1. (1)

    We build a taxonomy of community detection methods based on various techniques used.

  2. (2)

    We provide a detailed survey of works that come under different categories.

  3. (3)

    The evaluation measures for community results are categorized and summarized.

  4. (4)

    The applications of community detection in multilayer networks are introduced, as well as interesting directions for future works.

To the best of our knowledge, this is the latest work that provides a comprehensive survey on various community detection methods in multilayer networks.

1.3 Outline of the paper

The remainder is organized along the other 5 sections. In the next section, we start by presenting the multilayer network models with several real-world datasets and give brief comparisons on different definitions. Section 3 summarizes the existing community detection methods in multilayer networks and provides some metrics for quality evaluation. We introduce various applications of community detection in multilayer networks in Sect. 4, such as temporal networks, social networks, transportation systems, and biological systems. Section 5 offers concluding remarks and perspective ideas.

2 Models

As mentioned above, numerous researchers dedicated to solving the problems in their own situations. In the 1930s, sociograms (Roethlisberger and Dickson 2003) were proposed to represent social relationships in a banking room, which contains 14 individuals via 6 different types of social interactions, as shown in Fig. 1. Such networks are known as “multiplex networks” (Mucha et al. 2010) or “multi-relational networks” (Cai et al. 2005) in which edges are categorized by their types.

Fig. 1
figure 1

The sociograms proposed by Roethlisberger et al. contains 14 individuals via 6 different types of social interactions, observed friendship ties and cliques in a factory. Position reflects the location of their workspace

In recent years, great endeavors have been made to unveil the basic mechanisms for the generation of networks with specific structural properties. The analysis of networks has profound implications in very different fields, from social media analytics to biology (Newman 2018). The conventional graph model is incapable when the network is differentiated, multipartite, integrated, and dynamic. Thus, a series of more complicated models are proposed, such as temporal networks (Kostakos 2009), multiplex networks, k-partite networks, and so on (Boccaletti et al. 2014; Kivelä et al. 2014). However, the sudden and immense explosion of research on multilayer networks has also led to a great deal of confusion (Ahmed et al. 2018; Farooq and Zhu 2018; Kivelä et al. 2014; Paolucci 2018). The multilayer network we focus, in this paper, is not a neural network but a mathematic model which can be utilized to represent the complicated network structures.

A multilayer network is a network made of multiple graphs, called layers, which share the same set of nodes, but differ in their edges. To distinguish the definition of a multilayer network from a single-layer network, we intuitively compared the representation of a real-world dataset between the two models. Al Qaeda cell was isolated from a safe hiding place during training and plan terrorist attacks, which forced the organization to form a relatively dense social network, in which the Hamburg branch planned and eventually participated in the implementation of the September 11 attacks (Silber 2011). The social relationships of 9/11 terrorists are represented in Fig. 2.

Fig. 2
figure 2

The social relationships of 9/11 terrorists represented by a graph model. It is a single-layer network, which consists of 69 nodes and 159 edges. The size of the node represents the number of the neighbors (i.e., the node’s degree)

If we classify the closeness of the social actions among the terrorists, we can obtain three groups of links, abstracted into a three-layered network, as shown in Fig. 3.

Fig. 3
figure 3

The 9/11 terrorists’ interactions represented by a multilayer network. \(L_1\) presents confirmed close contact, \(L_2\) shows various recorded interactions, \(L_3\) contains potential or planed or unconfirmed interactions

It is obvious that the multilayer networks reveal more detailed information than the monolayer network. The multilayer network model contains nodes and edges from different layers, which represents the different frequencies (or types) of interactions among them.

The nodes in a multilayer network are consistent in that of graph model, which represents individuals across multiple layers likewise. Specifically, some works indicate that the nodes in a multilayer network can be classified into different categories (e.g. bipartite network), thereby the network composed of these nodes is described as a node-colored network (Baltakiene et al. 2018; Brummitt 2014; Kivelä et al. 2014). The different colors represent different types of nodes. For example, the urban transportation system mainly contains buses, subway, and so on. The bus stop and subway station make no difference in a conventional graph model but differ in a multilayer network for the different manners of transportation, thereby they are printed with different colors. Inherently, the layers’ information has covered the different node colors, i.e., the multilayer network model is a capable solution. The transportation network of Chengdu city is plotted by Muxviz (De Domenico et al. 2015b), as shown in Fig. 4.

Fig. 4
figure 4

The transportation system in Chengdu city of China, where the left layer shows the bus lines and the right layer shows the subway system. The nodes from different layers are connected by interlayer edges if the corresponding stations are within 0.5 km

The edges in a multilayer network are classified into intralayer edges and interlayer edges (De Domenico et al. 2016). The intralayer edges request the two nodes of this edge are in the same layer, while the interlayer edges (or crossed layer edges) connect nodes among different layers, as illustrated in Fig. 5.

Fig. 5
figure 5

A three-layered toy network. The edges in each layer are called intralayer edges, as marked by solid lines, the dashed lines crossed adjacent layers represent interlayer edges

A layer in multilayer networks is composed of a set of nodes and edges, i.e., a graph model. The layers are also organized into two categories:

  • Ordinal layers. The layers are sorted by a certain order, in which the interlayer edges connect the corresponding nodes in the adjacent layers. Take temporal networks for example, there are numerous layers representing different snapshots, but the order of layers is decided by the time sequence.

  • Categorical layers. The layers are classified into several groups, where each group represents a type of interaction.

2.1 Definitions

There are many terms for describing multilayer networks, such as multiplex network, multi-relational network, edge-colored network, node-colored network, multilevel network, multi-dimensional network, independent networks, networks of networks, temporal network and so on (Boccaletti et al. 2014; Kivelä et al. 2014). Table 2 summarizes the main notations used throughout this paper.

Table 2 Main symbols used in this paper

As we have known, a graph is a tuple \(G = (V,E)\), where V is a set of nodes and \(E \subseteq V \times V\) is the set of edges that connect pairs of nodes (Bollobás 2013). The model of multilayer networks is more complicated and there are mainly two kinds of explanations. One is summarized by Kivelä et al. (2014), described as

$$\begin{aligned} {\mathcal {M}}=(V_M, E_M, V, {\mathcal {L}}), \end{aligned}$$
(1)

where \(V_M\subseteq V\times {\mathcal {L}}_1\times {\mathcal {L}}_2\times \cdots \times {\mathcal {L}}_d,E_M \subseteq V_M\times V_M\). They employed a sequence \({\mathcal {L}}\), described as

$$\begin{aligned} {\mathcal {L}}=\{{{\mathcal {L}}_\alpha }\}_{\alpha =1}^d, \end{aligned}$$
(2)

where \(\alpha \) is called aspect, and \({\mathcal {L}}_\alpha \) depicts an elementary layer. The layer is a product of elementary layers, which can be represented as \({\mathcal {L}}_1 \times {\mathcal {L}}_2 \times \cdots \times {\mathcal {L}}_d\). The illustration of the multilayer network is given in Fig. 6.

Fig. 6
figure 6

The illustrated multilayer network model proposed by Kivelä et al. As shown in the left panel, the multilayer network has a total of four nodes, so \(V={1,2,3,4}\). There are two aspects, which has corresponding elementary-layer sets \({\mathcal {L}}_1=\{A,B\}\) and \({\mathcal {L}}_2=\{X,Y\}\). Therefore, there are four layers: (AX), (AY), (BX), and (BY). The right panel shows the representation of the conventional graph model with multiple labels of nodes

Another model is proposed by Boccaletti et al. (2014), defined as

$$\begin{aligned} {\mathcal {M}}=({\mathcal {G}},{\mathcal {C}}), \end{aligned}$$
(3)

where \({\mathcal {G}} = \{ G_\alpha ; \alpha \in \{1, \ldots , L\}\}\) is a family of (directed or undirected weighted or unweighted) graphs \(G_\alpha = (V_\alpha ,E_\alpha )\), which represents layers of \({\mathcal {M}}\) and \({\mathcal {C}}\) depicts the interactions between nodes of any two different layer, given by

$$\begin{aligned} {\mathcal {C}}=\{E_{\alpha \beta } \subseteq V_\alpha \times V_\beta ; \alpha ,\beta \in {1,\ldots ,L},\alpha \ne \beta \}, \end{aligned}$$
(4)

The two models differ in the definition of “aspect”. Kivelä’s model takes into account real-life situations, e.g., a social network contains relations among various types, timeline or situations. Each relation set is regarded as an aspect, namely a classification of layers, providing a comprehensive perspective. Thus, it is more considerate. The structure with aspects concept is much richer than that of ordinary networks. Possible aspects include different types of interactions or communication channels, different subsystems, different spatial locations, different points in time, and so on (De Domenico et al. 2016). However, Boccaletti’s model is a general form that easy to understand. Specifically, the supra-adjacency matrix is a distinct tool for representation of a multilayer network, defined as

$$\begin{aligned} {\mathcal {M}}= { \left[ \begin{array}{cccc} A_1 &{} I_{12} &{} \cdots &{} I_{1L} \\ I_{21} &{} A_2 &{} \cdots &{} I_{2L} \\ \vdots &{} \vdots &{} \ddots &{} \cdots \\ I_{L1} &{} I_{L2} &{} \cdots &{} A_L \end{array} \right] } \in {\mathcal {R}}^{N \times N}, \end{aligned}$$
(5)

where \(A_1, A_2,\ldots ,A_L\) are the adjacency matrix of layer \(1, 2,\ldots , L\), respectively. N is the total number of nodes, which can be calculated by \(N= \sum _{1 \le l\le L} |V^l|\). The non-diagonal block \(I_{\alpha \beta }\) represents the inter-layer edges of layer \(\alpha \) and layer \(\beta \). Thus, the interlayer edges can be represented as

$$\begin{aligned} I=\bigcup _{\alpha ,\beta =1,\alpha \ne \beta }^L I_{\alpha \beta }. \end{aligned}$$
(6)

Take the above-mentioned 9/11 terrorists’ network for instance, the supra-adjacency matrix is represented in Fig. 7.

Fig. 7
figure 7

Supra-adjacency representation of 9/11 terrorists’ network. The supra-adjacency matrix is represented as a block matrix, where the rows and columns depict the terrorists. The diagonal blocks indicate the interactions, while the non-diagonal blocks represents the that terrorists are simultaneously active on different respects of observed social actions

We also list some specific networks that can be represented by a multilayer network manner in the following.

Multiplex network is a special case of multilayer networks (Solá et al. 2013), where all layers share the same set of nodes but may have multiple types of interactions. Some works also use multi-relational networks, multidimensional networks (Berlingerio et al. 2011b) or edge-colored networks (no interlayer edges) for substitution. The underlying limitations exist in the network is node-aligned, i.e. each layer has the same nodes but merely differs in edges. Multiplex network is a special form of multilayer network, where the number of nodes in each layer is consistent, and the nodes are one-to-one correspondence. This network model simplifies the complexity of the general multilayer network form and is therefore widely used to deal with some special problems (Kanawati 2015).

Temporal networks (or time-varying networks) differ with conventional dynamic networks, which focus more on the ordinal variations of connections (Kostakos 2009). The temporal network has its own set of research models (Kempe et al. 2002; Kostakos 2009; Tang et al. 2012a), which can illustrate the dynamic characteristics of temporal networks. But it must be mentioned that the power of a multilayer network is that it can be compatible with the representation of the temporal network and can also describe the dynamic characteristics likewise. For example, the multilayer network model we have introduced (Boccaletti et al. 2014) can regard the network structure at each moment as a layer, and the arrangement between different layers is in chronological order. We collected the relationship of characters in the Game of Thrones (the first five seasons), as shown in Figs. 8 and 9.

Fig. 8
figure 8

The monolayer network representation of the relationships of characters in “Game of Thrones” of the first five seasons, which contains 796 characters and 2823 links among them. The size of the node depicts the degree centrality, e.g., Tyrion Lannister, Jon Snow, and Daenerys Targaryen, as three key roles in the story, have larger degrees

Fig. 9
figure 9

The multilayer network representation of the relationships of characters in “Game of Thrones” of the first five seasons. Each layer represents a season and the links between the ordinal layers represent the corresponding relationship of characters across different seasons

The study of k-partite networks starts from the complete k-partite graph (i.e., a set of graph vertices decomposed into k disjoint sets such that no two graph vertices within the same set are adjacent) (Brouwer and Haemers 2012). Thus, the k-partite network is also described as node-colored networks (Kivelä et al. 2014), where the nodes are unacquainted in the same layer but have the other layers’ common neighbors with other nodes in the same layer. A sample of k-partite networks is shown in Fig. 10.

Fig. 10
figure 10

Illustration of a k-partite network, where \(k=3\). There are no intralayer edges in any layer of the network, while the edges are all inter-connected across adjacent layers

However, a special case of k-partite (i.e., \(k=2\)) networks is the bipartite network (or two-mode network), which is more accessible by us. For example, a customer’s transaction records of products can be represented by a “customer-product” network. Fig. 11 presents a classic bipartite network, namely, South women activities network (Davis et al. 2009).

Fig. 11
figure 11

Illustration of South women activities network, which is a typical bipartite network. The qualification of the binary network is to check if there are links between nodes in the same type. All the links are connecting nodes with different types of nodes, corresponding with the non-diagonal elements as shown in the middle panel. The right panel presents the corresponding supra-adjacency matrix

2.2 Features

Many basic metrics such as centrality, node similarity, are commonly used by community detection algorithms in monolayer networks. While in multilayer networks, the metrics need to be reformulated and adapted. Thus, in this section, some most important features of multilayer networks are introduced. Studies of structural properties include descriptors to identify the most central nodes according to various notions of importance (Battiston et al. 2014; De Domenico et al. 2015c, 2013; Halu et al. 2013; Solá et al. 2013) and quantify triadic relations such as clustering and transitivity (Battiston et al. 2014; Cozzo et al. 2015; De Domenico et al. 2013).

2.2.1 Centrality

The ranking of nodes in multilayer networks is one of the most pressing and challenging tasks that research on complex networks is currently facing (Lü et al. 2016). Many centrality measures have been used for single-layer networks to rank the importance of the nodes, such as degree centrality (Bonacich 1972), betweenness centrality (Freeman 1977), closeness centrality (Freeman 1978), k-shell centrality (Carmi et al. 2007), eigenvector centrality (Bonacich 1987), PageRank (Brin and Page 1998), Leader-Rank (Lü et al. 2011), Local Centrality (Chen et al. 2012), Bridge-Rank (Salavati et al. 2018), and so on. The above measures are widely used in the monolayer network model, while the study on nodes centrality in multilayer networks is still an open issue.

Before introducing the centrality in networks, we will first cover the neighborhood in a multiplex network. There are two definitions for the neighborhood in a multiplex network: One is given by a restrict aggregation concept that j is a neighbor of i if j is connected to i in each layer. The other is loosely defined as j is connected to i in at least one layer (Kazienko et al. 2010). Alhajj and Rokne (2014) give a trade-off definition of the above two methods: j is a neighbor of i if it is connected to i in at least m layers, where \(1\le m\le L\), and L is the total number of layers. This definition may be capable of analyzing the node’s centrality in a multiplex network with numerous layers, but it is not ideal enough for introducing another threshold on the number of layers for consideration. However, when applying the concept in a general form of multilayer networks, there are also some problems to be solved. The first problem is the nodes’ relationships in a multilayer network, i.e., there may be no corresponding nodes of i in other layers, e.g. in a k-partite network where the nodes of each layer are totally different. The second problem is about the presentation format of the node’s neighborhood, the degree centrality can hardly distinguish a node with the same amount of interlayer edges and intralayer edges. The third problem is whether to give equal consideration of weights on interlayer edges and intralayer edges.

In monolayer networks, one of the main centrality measures is the degree of each node: the more links a node has, the more important the node is. The centrality of node i, e.g. the degree of a node i in a multiplex network is the vector (Battiston et al. 2014), defined as

$$\begin{aligned} k_i=(k_i^1,k_i^2,\ldots k_i^L), \end{aligned}$$
(7)

where \(k_i^L\) is the degree of node i in the \(L^{th}\) layer. We can also convert it into another form for simplification, given by

$$\begin{aligned} k_i=\sum _{l=1}^L k_i^l. \end{aligned}$$
(8)

Another solution for the centrality of multilayer nodes is through the diffusion across multiple layers. Examples of this measure include influential nodes identification, viral marketing, information diffusion, and so on (Lü et al. 2016). The network structure is more complicated with higher computational complexity. Some scholars believe that the interlayers edges should be under consideration discriminatively when calculating the node centrality, while others insist on the viewpoint that the interlayer edges make the same contribution to the centrality calculation. Likewise, the degree centrality measures used in multilayer networks can be substituted by other measures, e.g. Eigenvector centrality (Solá et al. 2013).

Above all, in a general form of multilayer network, the nodes may differ in each layer and we cannot obtain the corresponding nodes and calculate the centrality of the nodes directly. This problem has drawn much attention in recent years, including some popular endeavors such as user identification across multiple networks (Carmagnola and Cena 2009; Feng et al. 2017; Liang et al. 2015; Yang et al. 2018), social network coalescence, network alignment (Bayati et al. 2013), and so on.

2.2.2 Correlations

Multilayer networks encode significantly more information than their isolated single layers, since they incorporate correlations between the nodes in different layers and between the statistical properties of layers. The correlations of multilayer networks include interlayer degree correlations (Nicosia and Latora 2015), layers overlapping (Kao and Porter 2018), and so on. Some scholars point out that the communities in multilayer networks should consider for overlapping features, while allowing the communities to affect each layer in a different way, including arbitrary mixtures of assortative, disassortative, or directed structure (De Bacco et al. 2017). A definition of local overlap (Cellai et al. 2016) is defined as

$$\begin{aligned} o_i^{\alpha \beta }=\sum _j \theta (\omega _{ij}^\alpha )\theta (\omega _{ij}^\beta ), \end{aligned}$$
(9)

where \(o_i^{\alpha \beta }\) is the number of overlapping edges that are incident to node i in both layer \(\alpha \) and layer \(\beta \), \(\omega _{ij}^\alpha \) and \(\omega _{ij}^\beta \) are the weights of intralayer edges (ij) in layer \(\alpha \) and layer \(\beta \), respectively. \(\theta (x)=1\) if \(x>0\) and \(\theta (x)=0\) otherwise. Based on this concept, Kao et al. propose a method that grouping structurally similar layers in multiplex networks and find meaningful groups of layers (Kao and Porter 2018). Considering the degree in undirected multiplex networks, the connection similarity is defined as

$$\begin{aligned} \phi ^{\alpha \beta } = \frac{1}{N} \sum _i \phi _i^{\alpha \beta } \in [0,1], \end{aligned}$$
(10)

where \(\phi _i^{\alpha \beta }\) is local similarity, defined as

$$\begin{aligned} \phi _i^{\alpha \beta }= \frac{o_i^{\alpha \beta }}{k_i^\alpha +k_i^\alpha -o_i^{\alpha \beta }}. \end{aligned}$$
(11)

The correlations of nodes are of great importance in analyzing multilayer network structures. Zhan et al. improved the community detection algorithms in multi-relational social networks by utilizing triangles and latent factor cosine similarity prior methods (Zhan et al. 2018). The local topological correlations between any pair of nodes from different layers are also utilized to calculate the centrality of nodes in multilayer networks (Kuncheva and Montana 2015).

3 Methods

Communities are known as groups of nodes that are more strongly or frequently connected among themselves than with the remainders (Aldecoa and Marín 2013). Although connecting patterns with other members are possible, they usually have higher linking probability within the group (Fortunato and Hric 2016). Community detection is a fundamental issue in network science, and most existing approaches have been developed for monolayer networks. However, many complex systems are composed of coupled networks through different layers, where each layer represents one of many possible types of interactions.

3.1 Problem statement

Communities in single-layer networks comprise a group of well-connected nodes, while in multilayer networks, communities reveal the relationships among nodes in various layers. The comparison of communities on toy examples in single-layer networks and multilayer networks is given in Fig. 12.

Fig. 12
figure 12

Comparison of communities in monolayer and multilayer networks. The nodes with different gray levels represent different communities. (a) Communities in a monolayer network. (b) Communities in a two-layered network, each community shares nodes in Layer 1 and Layer 2

The problem of community detection algorithms in multilayer networks is to divide the network into a set of disjoint cohesive modules \(C_1, C_2, \ldots , C_k\) where each module \(C_k\) is comprised of a group of nodes densely connected inside and loosely connected outside the community. It can be described as

$$\begin{aligned} \cup _{i=1}^k C_i = \sum _{\alpha =1}^L V_\alpha , \end{aligned}$$
(12)

with \(\cap _{i=1}^k C_i= \phi \) for non-overlapping community detection and \(\cap _{i=1}^k C_i \ne \phi \) for overlapping scenarios. Dalibard (Dalibard 2012) gives the three requirements about community detection works as followings:

  1. (1)

    The community detection should allow for overlapping communities.

  2. (2)

    The detected results should be statistically significant, which means applying the algorithm on a random null model should return no communities.

  3. (3)

    The detected results should be hierarchical.

Specifically, most of the existing approaches can scarcely satisfy the requirements and hold reasonable efficiency. Besides, the evaluation metrics are also varied from one to another, as introduced in the following subsection.

3.2 Evaluation functions

Quality evaluation for partition results is a complex task due to the lack of a shared and universally accepted definition of community structures. A wide variety of quality functions have been proposed to solve the community detection problem from different perspectives. Several representative metrics are analyzed and classified into three categories (Cazabet et al. 2015):

  1. (1)

    Single score metrics

  2. (2)

    Evaluation on generated networks

  3. (3)

    Evaluation on real networks with ground truth

Single score metrics employ quality functions associating a score to the community detection result. For example, the number of edges between partitions can be utilized to evaluate the performance of a given algorithm. Popular metrics include Modularity (Newman and Girvan 2004), Modularity density (Li et al. 2008), surprise (Aldecoa and Marín 2011), and so on. These metrics are universal but often criticized with no consensus of the several meaningful levels of partitions. The comparison with generated networks, e.g., LFR benchmark (Lancichinetti et al. 2008) is widely used to compare the partitions with the community affiliations. It is easy to recognize a good community and capable of evaluating variations in usual communities. However, sometimes the generated networks are not realistic and differ from the partitions we want. The comparison of real networks with ground truth seems to be a reasonable solution. Normalized Mutual Information (Danon et al. 2005), Precision, Recall, and \(F_\text {1-score}\) (Herlocker et al. 2004; Perry et al. 1955) are representative methods. However, these methods depend on the priori partition labels, which are probably unknown in most of real-world datasets.

3.2.1 Modularity, modularity density, performance and surprise

In 2004, modularity (Newman and Girvan 2004) was first proposed by Newman as an evaluation for community partitions, defined as

$$\begin{aligned} Q=\frac{1}{2m} \sum _{ij} \left( A_{ij} - \frac{k_i k_j}{2m}\right) \delta (C_i, C_j), \end{aligned}$$
(13)

where m is the number of edges, \(A_{ij}\) is the element of the adjacency matrix, \(\delta (C_1,C_2)\) equals 1 if i and j are in the same community, otherwise 0. In 2010, Mucha et al. proposed \(Q_M\) (Mucha et al. 2010) for evaluating community in time-dependent, multiscale and multiplex networks, defined as

$$\begin{aligned} Q_M = \frac{1}{2\mu }\left\{ \left( A_{ij\alpha }-\gamma \frac{k_{i\alpha } k_{j\alpha }}{2m_\alpha }\right) \delta _{\alpha \beta } + \delta _{ij}\ C_{j\alpha \beta } \right\} \delta (g_{i\alpha },g_{j\beta }), \end{aligned}$$
(14)

where \(\mu \) denotes the number of links in multiplex networks, \(\gamma \) is the resolution parameter. \(A_{ij\alpha }\) represents the adjacency matrix of nodes in layer \(\alpha \). \(C_{j\alpha \beta }\) represents interlayer edge connecting node j among layer \(\alpha \) and layer \(\beta \). This metric introduces a coupling between communities in neighboring layers by allowing interlayer edges, while different \(\gamma \) enables the detection of different scale communities. However, the range of \(\gamma \) is required to be manually determined, which may be unable to obtain reasonable results without an appropriate \(\gamma \). Afterward, a variational version of \(Q_M\) is given (Pramanik et al. 2017) as

$$\begin{aligned} Q=\frac{1}{2m} \sum _{ij} (A_{ij}-P_{ij}) \delta (\psi _i,\psi _j), \end{aligned}$$
(15)

where \(\delta (\psi _i,\psi _j)\) is the Kronecker delta function, it equals 1 iff \(\psi _i=\psi _j\), i.e. i and j belong to the same community and 0 otherwise. The penalty term \(P_{ij}\) is the expected probability of existing an edge between nodes i and j if edges are placed at random as

$$ \begin{aligned} P_{ij} = {\left\{ \begin{array}{ll} P_{ij}^1, &{}\text {if }i \in V^1 \& j \in V^1 \\ P_{ij}^2, &{}\text {if }i \in V^1 \& j \in V^2 \\ P_{ij}^{12}, &{}\text {if }i \in V^1 \& j \in V^2{ or}i \in V^2 \& j \in V^1 \end{array}\right. }, \end{aligned}$$
(16)

where \(P_{ij}^1\) can be calculated by \(P_{ij}^1=(h_i \times h_j)/(2|E^1|)\) and \(P_{ij}^2=(h_i \times h_j)/(2|E^2|)\) and \(P_{ij}^{12}=(c_i \times c_j)/(2|I_{12}|)\). \(h_i\) and \(h_j\) are the intralayer degrees of nodes i and j, and \(c_i\) and \(c_j\) are the respective coupling degrees of i and j. \(|I_{12}|\) depicts the amount of all the interlayer edges among layers \(L_1\) and \(L_2\). Likewise, the multiplex modularity is also criticized by resolution limits (Vaiana and Muldoon 2018).

Modularity density (Li et al. 2008) was proposed to solve the resolution limits problem, defined as

$$\begin{aligned} D=\sum _{i=1}^{c} \frac{L\left( V_{i}, V_{i}\right) -L\left( V_{i}, {\bar{V}}_{i}\right) }{\left| V_{i}\right| }, \end{aligned}$$
(17)

where c is the total number of communities, \(\left| V_{i}\right| \) is the number of nodes of the i-th community. \(L\left( V_{i}, V_{i}\right) =\varSigma _{j \in V_{i}, k \in V_{i}} A_{j k}\) denotes the number of edges among i-th community and \(L\left( V_{i}, {\bar{V}}_{i}\right) =\varSigma _{j \in V_{i}, k \in {\bar{V}}_{i}} A_{j k}\) denotes the number of connections between the i-th community and other communities.

In 2010, Performance (Fortunato 2010) was proposed as a community quality function, which mainly considered the number of correctly “interpreted” pairs of nodes (i.e., the nodes belonging to the same community and connected by an edge, and nodes belonging to different communities and not connected by an edge), defined as

$$\begin{aligned} {\mathcal {P}}=\frac{\left| \left\{ (i, j) \in E, C_{i}=C_{j}\right\} \right| +\left| \left\{ (i, j) \notin E, C_{i} \ne C_{j}\right\} \right| }{n(n-1) / 2}, \end{aligned}$$
(18)

where E denotes the edges set, \(C_i\) and \(C_j\) denote the i-th and j-th communities, respectively, n is the total number of nodes. Specifically, Coverage (Fortunato 2010) was defined as the ratio of the number of intra-community edges by the total number of edges, given as

$$\begin{aligned} {\mathcal {C}}=\frac{\left| \left\{ (i, j) \in E, C_{i}=C_{j}\right\} \right| }{n(n-1) / 2}. \end{aligned}$$
(19)

In 2011, Aldecoa and Marín (2011) proposed the “Surprise” as a measure for detecting communities. Different from merely considering the number of edges required in a partition, this metric takes the number of nodes into account, and it is capable of resolving the resolution limit problem and detecting small communities (Fortunato and Barthelemy 2007; Lancichinetti and Fortunato 2011). It is an interesting function to measure how impossible is a given partition compared to a null model, defined as

$$\begin{aligned} S =-\log \sum _{j=p}^{\min (M,n)} \frac{ \left( {\begin{array}{c}M\\ j\end{array}}\right) \left( {\begin{array}{c}F-M\\ n-j\end{array}}\right) }{\left( {\begin{array}{c}F\\ n\end{array}}\right) }, \end{aligned}$$
(20)

where F is the maximum possible number of links in the network (i.e. \(k[k-1]/2\), being k the number of nodes), n is the observed number of links, M is the maximum possible number of intracommunity links observed in a partition. The parameter S, which stands for Surprise, indeed measures the “surprise” (improbability) of finding by chance a partition with the observed enrichment of intracommunity links to a random graph. The authors declare that surprise implicitly assumes a more complex definition of community: a precise number of units for which it is found a density of link which is statistically unexpected given the features of the network, and in 2013, they also designed surprise maximization methods for detecting community structure in complex networks (Aldecoa and Marín 2013). As an effective metric for evaluating community structures, experiments on the human brain network (Fox and Lancaster 2002; Laird et al. 2005) have also proved its priority to modularity (Nicolini and Bifone 2016).

3.2.2 MDL, Pareto frontier and redundancy

The fundamental idea behind the MDL principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally (Grunwald 2004). Rosvall et al. convert the community detection task into solving a coding problem following the MDL principle (Rosvall and Bergstrom 2008). Analogously, the objective function of community detection can be considered as a multi-objective optimization problem. The optimal partitioning for a multilayer network is achieved by maximizing a local evaluation indicator (e.g. local modularity (Chen et al. 2018b)) in each layer, i.e.,

$$\begin{aligned} C = \arg \max _C[f_1(C),f_2(C), \ldots , f_k(C)], \end{aligned}$$
(21)

where k denotes the number of communities. However, calculating an exact Pareto front is, in general, a challenging task. The most popular approximate methods are genetic algorithms, which employ biologically inspired heuristics to attempt to transform randomly selected seed cases into solutions on the Pareto front using propagation Multi-objective Management (Caramia and Dell’Olmo 2008).

Berlingerio et al. define the redundancy index \(\rho _c\) (Berlingerio et al. 2011a), which captures the phenomenon that a set of nodes constitute a community in a dimension tend to constitute communities also in other dimensions. The redundancy figures out the fraction of redundant links in a multi-dimensional network, defined as

$$\begin{aligned} \rho _{c} = \sum _{(u, v) \in {\bar{P}}_{c}} \frac{|\{\alpha : \exists (u, v)^\alpha \in E\}|}{L \times \left| P_{c}\right| }, \end{aligned}$$
(22)

where c represents a community, \(\alpha \) is a layer of all the layers \({\mathcal {L}} = \{1,2,\ldots ,L\}\), P is a set of node pairs (uv) existed at least one layer in a multilayer network; \({\bar{P}}\) is the set of node pairs existed at least two layers. \({\bar{P}}_{c}\) is the subset of P appearing in c; \({\bar{P}}_{c} \subseteq {\bar{P}}\) is the subset of \({\bar{P}}\) only containing node pairs in c. The more layers connect each pair of nodes within a community, the higher the redundancy will be.

3.2.3 Purity, NMI, and ARI

The clustering accuracy measures are widely utilized to evaluate and compare the performance of community detection algorithms on real-world networks with given ground-truth communities. Suppose the computed clusters \(\varOmega =\{\omega _1,\omega _2,\ldots ,\omega _k\}\) with respect to the ground truth classes \(C={c_1,c_2,\ldots ,c_k}\). Purity (Zhao and Karypis 2004) represents the percentage of the total number of nodes classified correctly, defined as

$$\begin{aligned} Purity(\varOmega , C)=\frac{1}{N} \sum _{k} \max _{j}\left| \omega _{k} \cap c_{j}\right| , \end{aligned}$$
(23)

where N is the total number of nodes, and \(|\omega _k \cap \ c_j |\) depicts the number of nodes in the intersection of \(\omega _k\) and \(c_j\). To compromise the quality of the clustering against the number of clusters, we can utilize normalized mutual information (i.e., NMI) (Danon et al. 2005). The confusion matrix is comprised of ground communities and generated partitions, thereby NMI is defined as

$$\begin{aligned} NMI(A,B) = \frac{-2 \sum _{i=1}^{C_A} \sum _{j=1}^{C_B} N_{ij} \log \frac{N_{ij} N}{N_i N_j}}{\sum _{i=1}^{C_A} N_i \log \frac{C_i}{N} + \sum _{j=1}^{C_B} N_j \log \frac{C_j}{N} }, \end{aligned}$$
(24)

where A and B denote the ground-truth communities and the detected partitions. \(C_A\) and \(C_B\) are the number of groups in partition A and B, respectively. \(N_{ij}\) depicts the elements of the confusion matrix. \(N_i\) is the sum of the elements in row i, \(N_j\) is the sum of elements in column j. N is the number of nodes. The range of NMI is [0, 1]. If \(A=B\), \( NMI(A,B)=1\). If A and B are completely different, \(NMI(A,B)=0\). Suppose an approximate size z as the number of community sets, the computation of NMI requires \(O(z^2)\) comparisons, which is incapable of evaluating partitions for large-scale networks. In order to cope with the high computational complexity of such method in recent years, several approaches (Cazabet et al. 2015; Rossetti et al. 2016a, b), e.g., the precision, recall, and \(F_\text {1-score}\) are employed, defined as

$$\begin{aligned} Precision= & {} \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}, \end{aligned}$$
(25)
$$\begin{aligned} Recall= & {} \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}, \end{aligned}$$
(26)
$$\begin{aligned} F_{1-\text {score}}= & {} \frac{2\times (\text {Precision}\times \ \text {Recall})}{\text {Precision} + \text {Recall}}, \end{aligned}$$
(27)

with TP = true positive, FP = false positive, FN = false negative and TN = true negative. Besides, Rand Index is also a popular measure, which represents the percentage of TP and TN decisions assigns that are correct (i.e. accuracy), defined as

$$\begin{aligned} RI(\varOmega , C)= \frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}+\mathrm{TN}}. \end{aligned}$$
(28)

ARI (Schütze et al. 2008) is RI defined to be scaled in the range [0, 1]. All of the metrics in this subsection are in the range of [0, 1], and the higher the metric, the better the clustering quality is.

3.3 Algorithms classification

Early approaches either collapse multilayer networks into a weighted single-layer network (Berlingerio et al. 2011a; Tang et al. 2012b; Taylor et al. 2016b, a), or extend the existing algorithms for each layer and then merge the partitions via consensus clustering (Tang et al. 2009; Papalexakis et al. 2013). However, these approaches have been criticized for ignoring the connections across layers, thereby resulting in low accuracy. Research to date has exhibited some novel algorithms for discovering communities in multilayer networks directly (Ma et al. 2018; Pamfil et al. 2019). Generally speaking, the strategies are mainly classified into three categories (Tagarelli et al. 2017):

  • Flattening methods

  • Aggregation methods

  • Direct methods

Flattening methods collapse the layers’ information into a single layer and then conduct the traditional monolayer algorithms for detecting communities. This strategy is very common in multiplex networks, where the multiplex network is converted into a multi-relationship network (Rocklin and Pinar 2013) or a monolayer network (Kuncheva and Montana 2015).

Aggregation methods discover the communities in each layer and then merge them by a certain aggregation mechanism, which could be useful for removing redundant information. The aggregation process requires \(2^L\) comparisons and it’s very time-consuming for a temporal network with numerous layers. Thus, “layer communities grouping” is proposed to reduce the redundant layers (Kao and Porter 2018). Dalibard proposed a parameter \(P_{C_k}^l\) for each layer \(l\in L\) to describe the probability of communities, and then aggregate the communities in each layer in terms of a correlation coefficient \(\rho _{\alpha \beta }\) between layers \(\alpha \) and \(\beta \) (Dalibard 2012). Numerical experiments on synthetic multilayer networks show that the analysis fails in aggregated networks, whereas the multilayer method can accurately identify modules across layers that originate from the same interaction process (De Domenico et al. 2015a). Thus, aggregation is not recommended.

Direct methods aim to detect the community structures directly on the multilayer network by optimizing some quality-assessment criteria without flattening (Oselio et al. 2015). For example, Pramanik et al. defined a multilayer modularity index, i.e., \(Q_M\), and combined with the improved GN and Louvain algorithms, namely GN-\(Q_M\) and Louvain-\(Q_M\), respectively (Pramanik et al. 2017).

There has been plenty of endeavors made in the last decades, however, research for multilayer networks is still in infancy (Tagarelli et al. 2017), and it is promising to extract communities without collapsing multilayer networks. Several representative methods are introduced in the following subsections.

3.3.1 Improved label propagation algorithm

Label propagation algorithms utilize the propagation features of networks and have linear complexity and reasonable results. These methods allow the nodes to adopt new characteristics depending on the behavior of their neighbors, e.g. adopts labels of the biggest amount of its neighbors. The process of LPA is shown as follows:

  1. Step 1:

    Traverse the network and assign a unique label to each node.

  2. Step 2:

    Establish a random order of the node’s revision.

  3. Step 3:

    Each node is revised in the assigned order and adopts the most frequent label of its neighbors.

  4. Step 4:

    The process is performed iteratively until the algorithm converges and no label changes occur anymore.

Inspired by the traditional LPA process, Alimadadi et al. (2019) redefined the neighborhood in multilayer networks and then proposed MNLPA to detect communities in a weighted and directed Facebook activity network. The algorithms are summarized as followings:

  1. Step 1:

    Each node u is initiated with a unique label, then the neighbors of u in all layers are obtained, marked as \(N_u\).

  2. Step 2:

    Calculate the similarity (measures are listed in Table 3) of u between the node v in \(N_u\), and mark v when the similarity score is more than a given threshold \(\sigma \).

  3. Step 3:

    Repeat the following steps until the stop criterion is satisfied:

    • Nodes are ordered randomly;

    • For each node u, each marked similar neighbor sends out its label to u, and mark the node u with maximum labels.

  4. Step 4:

    Divide communities by the nodes with the same labels.

Table 3 Similarity measures for MNLPA algorithm

The MNLPA is praiseworthy with efficiency, and capable of dealing with weighted and directed networks, but also criticized by instability. The partition result is sensitive to the threshold parameter in MNLPA and the density of the network dataset. As they declared, the MNLPA algorithm is verified by experiments on real-world datasets and the results are reprinted in Fig. 13.

Fig. 13
figure 13

Comparison of LPA and MNLPA algorithms conducting on Facebook datasets. The homophily obtained by MNLPA algorithm is marked by gray bars, thus the resultant communities are similar and fit the definition of the community to some extent

As the Facebook datasets are not provided in their works, we construct several synthetic three-layered networks, as plotted in Fig. 14.

Fig. 14
figure 14

Three synthetic multilayer networks for evaluating the proposed MNLPA. The top three panels show the multilayer networks, in which each has three layers with community labels generated by LFR benchmark (Lancichinetti et al. 2008). The bottom three panels show the relevant supra-adjacency matrices

Experiments on the constructed synthetic networks suggest that MNLPA algorithm is fastidious about the parameters. The changing of modularity with the varying threshold \(\delta \) is plotted, as shown in Fig. 15.

Fig. 15
figure 15

The modularities tendency obtained by the MNLPA algorithm conducting on three different synthetic networks changes with the varying threshold \(\delta \)

As shown in Fig. 15, the performance of the proposed MNLPA is not satisfied. With an increasing threshold in [0.2, 0.4], the modularity increased, thus we guess the threshold should be greater than 0.4 approximately. The Facebook datasets utilized in the experiments are weighted and directed, which may make prominent differences in the experimental results. In brief, the MNLPA is suitable for large-scale networks under certain conditions (e.g., weighted and directed), while for general multilayer networks, it’s necessary to make modifications to improve the performance.

3.3.2 Nonnegative matrix factorization methods

Nonnegative matrix factorization (NMF) was proposed by Lee and Seung (2001). It aims to factorize the original nonnegative matrix into the product of two other nonnegative matrices. For applications in community detection methods, the original nonnegative matrix can be an adjacency matrix, thereby the objective (loss) function can be presented as

$$\begin{aligned} \min _{U \ge 0, V \ge 0} L(A, UV^T) = \left\Vert A - UV^T\right\Vert _F^2, \end{aligned}$$
(29)

where A is a \(n \times n\) adjacency matrix, and both U and V are \(n \times k\) matrices. The rank k corresponds to the number of divided communities. It has been widely utilized in detecting communities in complex networks (Jiao et al. 2017; Liu et al. 2017; Wu et al. 2018).

Recently, Ma et al. applied this method (S2j-NMF) to community detection for multilayer networks (Ma et al. 2018). They propose a quantitative function (i.e. multilayer network modularity density) and prove the trace optimization of multilayer modularity density is equative to the objective functions of the community detection algorithms (e.g. k-means (MacQueen 1967), NMF, spectral clustering (Ng et al. 2002), multiview clustering for multilayer networks, etc.). The modularity density \(Q_{D}\) for \(\left\{ V_{c}\right\} _{c=1}^{k}\) is defined as

$$\begin{aligned} Q_{D}\left( \left\{ V_{c}\right\} _{c=1}^{k}\right) =\sum _{c=1}^{k} \frac{L\left( V_{c}, V_{c}\right) -L\left( V_{c}, {\bar{V}}_{c}\right) }{\left| V_{c}\right| }, \end{aligned}$$
(30)

where \(Q_{D}(\left\{ V_{c}\right\} _{c=1}^{k})\) is the modularity density of community partitions, k is the number of partitions, \(\bar{V_c}\) depicts the partitions after removing \(V_c\). \(L(V_i,V_j)\) calculates the connections between \(V_i\) and \(V_j\), defined as

$$\begin{aligned} L(V_i,V_j)=\sum _{p\in V_i,q\in V_j} w_{pq}, \end{aligned}$$
(31)

where p and q are the nodes of the partition \(V_i\) and \(V_j\), respectively. \(w_{pq}\) is the weight of the edge (pq) and equals 1 in unweighted networks. The objective function is transformed into a multi-objective optimization problem as

$$\begin{aligned} Q_{D}^{{\mathcal {G}}}\left( \left\{ V_{c}\right\} _{c=1}^{k}\right) =\frac{1}{m} \sum _{l=1}^{m} \sum _{c=1}^{k} \frac{L^{l}\left( V_{c}, V_{c}\right) -L^{l}\left( V_c, {\bar{V}}_c\right) }{\left| V_c\right| }, \end{aligned}$$
(32)

where \(L^l(V_i,V_j)\) is the same with equation (31) applied in l layer. \(Q_D^{\mathcal {G}} (\{V_c\}^k_{c=1})\) is the objective function of the partitions in multilayer network \({\mathcal {G}}\). Thus, the optimal partitioning \({\{V_c\}}_{c=1}^k\) for the multilayer network by maximizing the modularity density in each layer can be represented as

$$\begin{aligned} \left\{ \begin{array}{l} {\max \left( Q_{D}^{1}(\left\{ V_{c}\right\} _{c=1}^{k})\right) } \\ {\max \left( Q_{D}^{2}(\left\{ V_{c}\right\} _{c=1}^{k})\right) } \\ {\cdots } \\ {\max \left( Q_{D}^{m}(\left\{ V_{c}\right\} _{c=1}^{k})\right) } \end{array}\right. . \end{aligned}$$
(33)

Afterward, dense subgraphs are discovered by employing a greedy search strategy in multilayer networks. The conventional NMF algorithm combined with a factorized basis matrix and various coefficient matrices are applied to each layer. Finally, the experiments are conducted on several datasets, which verifies the proposed method.

The complexity of this method is \(O(mn^2k)\), where m is the number of layers and k is the number of partitions. Thus, it is probably not acceptable for large-scale networks. Besides, as the authors mentioned, the algorithm is based on multiplex networks, which is not capable of handling the general form of multilayer networks. The algorithm relies on prior information and the number of target communities, and the decomposition process might be time-consuming.

3.3.3 Random walk methods

Kuncheva et al. propose a community detection algorithm, namely LART (Locally Adaptive Random Transitions) for the detection of communities that are shared by either some or all the layers in multiplex networks (Kuncheva and Montana 2015). They employ the supra-adjacency matrix \({\check{M}}\) and define the transition probabilities of four possible moves among the nodes, described as

$$\begin{aligned} \left\{ \begin{aligned} P_{(i, j)(i, k)}&=\frac{{\check{M}}(i, k)(i, k)}{k_{i, k}} \\ P_{(i, j)(j, k)}&=\frac{{\check{M}}(i, k)(j, k)}{k_{i, k}} \\ P_{(i, k)(i, l)}&=\frac{{\check{M}}(i, k)(i, l)}{k_{i, k}} \\ P_{(i, j)(j, l)}&=0 \end{aligned}\right. , \end{aligned}$$
(34)

where \(k_{i,k}\) is the multiplex degree of node \(v_i^k\) in \({\check{M}}\) defined as \(k_{i,k}=\sum _{j,l} {\check{M}}_{(i,k)(j,l)}\), \(P_{(i,j)(j,k)}\) depicts the transition probability from node i of layer k (i.e., \(v_i^k\)) to node j of the layer l (i.e., \(v_j^l\)). The probability to move from node \(v_i^k\) to node \(v_j^l\) is zero when \(i\ne \ j\) and \(k\ne \ l\) since there cannot exist a direct move where there is no connection. The transition probabilities are represented as a matrix \({\mathcal {P}}\) of the random walk process and written as

$$\begin{aligned} {\mathcal {P}}={\mathcal {D}}^{-1}{\check{M}}, \end{aligned}$$
(35)

where \({\mathcal {D}}\) is the diagonal matrix defined by the multiple node degrees. A dissimilarity matrix S(t) which depends on the multiplex random walk of steps t is defined. The dissimilarity matrix is defined according to the nodes i and j are in the same layer or different layers, denoted by

$$\begin{aligned} S(t)_{(i, k)(j, k)}= & {} \sqrt{\sum _{h=1}^{N} \sum _{m=1}^{L} \frac{\left( {\mathcal {P}}_{(i, k)(h, m)}^{t}-{\mathcal {P}}_{(j, k)(h, m)}^{t}\right) ^{2}}{k(h, m)}}, \end{aligned}$$
(36)
$$\begin{aligned} S(t)_{(i, k)(j, l)}= & {} \sqrt{s_{1}+s_{2}+s_{3}}, \end{aligned}$$
(37)

where \(s_1, s_2, s_3\) are defined as

$$\begin{aligned} \left\{ \begin{array}{l} {s_{1}=\sum _{h=1}^{N}\left( \frac{{\mathcal {P}}_{(i, k)(h, k)}^{t}}{\sqrt{k(h, k)}}-\frac{{\mathcal {P}}_{(j, l)(h, l)}^{t}}{\sqrt{k(h, l)}}\right) ^{2}} \\ {s_{2}=\sum _{h=1}^{N}\left( \frac{{\mathcal {P}}_{(i, k)(h, l)}^{t}}{\sqrt{k(h, l)}}-\frac{{\mathcal {P}}_{(j, l)(h, k)}^{t}}{\sqrt{k(h, k)}}\right) ^{2}} \\ {s_{3}=\sum _{h=1}^{N} \sum _{m=1 ; m \ne k, l}^{L} \frac{\left( {\mathcal {P}}_{(i, k)(h, m)}^{t}-{\mathcal {P}}_{(j, k)(h, m)}^{t}\right) ^{2}}{k(h, m)}} \end{array}\right. . \end{aligned}$$
(38)

Afterward, the agglomerative clustering is utilized to merge nodes in communities. The multiplex modularity \(Q_M\) is employed to evaluate the quality of partitions. The process of LART is shown as follows:

  1. Step 1:

    Assign each node in each layer to its own community.

  2. Step 2:

    Merge nodes based on the average linkage criterion using the distance matrix S and obey the principle of the merged community has at least one within-layer or interlayer connection.

  3. Step 3:

    Merge the nodes only if the maximum \(Q_M\) is reached.

  4. Step 4:

    Obtain the shared and non-shared communities.

The experiments are conducted on synthetic multiplex networks, and the experimental results are shown in Fig. 16.

Fig. 16
figure 16

The comparison of the proposed LART algorithm with MM and PMM algorithms (Kuncheva and Montana 2015) on the five simulated scenarios of the synthetic network

The proposed LART algorithm is conducting on five different scenarios and the experimental result demonstrates the performance of the proposed algorithm. However, the algorithm is limited to multiplex networks, and the real-world networks are much more complicated. Hence, the performance of the LART algorithm for real-world datasets is uncertain.

3.3.4 Multi-objective optimization methods

Pizzuti and Socievole (2017) proposed the Multi-layer many-objective Optimization algorithm (MLMaOP), in which they formulated the community detection problem in multilayer networks as a many-objective optimization problem and a given objective is contemporarily optimized on all the network layers. In their work, they give the multi-objective optimization problem (MOP) as

$$\begin{aligned} \min _{x} F(x)= \left( f_{1}(x), f_{2}(x), \ldots , f_{d}(x)\right) \text{ subject } \text{ to } x \in X, \end{aligned}$$
(39)

where d is the number of objective functions, \(x=(x_1, x_2, \ldots , x_n) \in X \) is the decision vector with a domain of definition \(X \subseteq R^n, F: X \rightarrow Z\) is the mapping from the decision space X to the objective space Z. When \(d \ge 3\), an MOP is referred to as Many Objective Optimization Problem (MaOP) (Farina and Amato 2002). Pareto-dominance relation is used to define a partial ordering in the objective space. Thus, the problem of community detection in multilayer networks using MaOP is defined as

$$\begin{aligned} \min F(P) = (F_1(P),F_2(P), \ldots ,F_d(P)) \text { subject to } P \in \varOmega , \end{aligned}$$
(40)

where each \(F_{\alpha }: \varOmega \rightarrow R\) computes the value of the objective function only on the layer \(G_\alpha \). For the main purpose is to get a maximized Q, so \(F_\alpha (P)=-Q_\alpha (P)\) means that the greater Q, the smaller \(F_\alpha (P)\) partitions on each layer. The main process of MLMaOP is shown as follows:

  1. Step 1:

    Initialize a rand partition by using the adjacency matrix of projected M.

  2. Step 2:

    Traverse the partition in all the layers, evaluate the objection function on \(G_\alpha \) to obtain \(F_\alpha (P)\). Assign a rank based on Pareto dominance and then combine parents and offspring partition into fronts.

  3. Step 3:

    Select the best points, and apply the variation operators and create the next partition.

  4. Step 4:

    Choose a solution from the Pareto front.

The comparison of MLMaOP algorithm with other approaches (Loe and Jensen 2015) is shown in Fig. 17.

Fig. 17
figure 17

The comparison of MLMaOP algorithm with competitors on SSRM dataset (Loe and Jensen 2015)

The proposed algorithm with three different strategies is competitive on the partitions \(P_1\), while on \(P_2\) and \(P_3\), we can see that the NMI results obtained by the other algorithms are better. Besides, the MLMaOP algorithm suffers from a low convergence rate to Pareto front and is likely to be time-consuming for detecting communities in large-scale networks.

3.4 Discussion

In the last decade, a plethora of approaches have been proposed to address the community detection problem with enormous network data. We list several representative methods (from 2009 to 2019), as shown in Table 4.

Table 4 A brief comparison of community detection methods in multilayer networks

Table 4 shows that most of the presented methods are holding relatively high complexity, where GN-\(Q_M\), Louvain-\(Q_M\) and LART methods are based on multiplex modularity maximization and unfavorable on general multilayer networks. Moreover, the majority are designed for multiplex networks, which require the nodes in each layer should be aligned. As we have introduced in the previous subsection, some improved version of classic monolayer algorithms, e.g., GenLouvain has been regarded as a benchmark and is really worth expecting for general multilayer networks. We can anticipate four prospective directions, i.e., random walk-based method, tensor decomposition, nonnegative matrix factorization, and modularity optimization will receive increasing attention over time.

In addition to the above-mentioned directions, quite a part of algorithms focus on overlapping community detection (Liu et al. 2018) and local community detection (Interdonato et al. 2017; Jeub et al. 2015; Li et al. 2019). On the one hand, with the increasing of network scale, global computation becomes time-consuming, which promoting local community detection into our view. Liu et al. (2017) proposed an improved multi-objective evolutionary approach for community detection in multilayer networks. Aiming at solving the local community detection problem, they employ a string-based representative scheme and genetic operation and local search. However, the algorithm adapts the strategy of conducting the Louvain algorithm (Blondel et al. 2008) on each layer and then merges the partitions, which seems to deviate from the multilayer community concept. More than that, comparisons with other competitors are not provided. On the other hand, overlapping communities are also ubiquitous in multilayer networks (De Domenico et al. 2016), i.e., some nodes are attached to multiple partitions simultaneously (Chen et al. 2016). Kao and Porter (2018) proposed a method based on computing pairwise similarities between layers and then executing community detection for grouping structurally similar layers in multiplex networks. The algorithm is verified in both synthetic and empirical multiplex networks. As most of the compared algorithms are designed for multiplex networks, there’s still a great deal of works to do in community detection in the general multilayer networks.

In brief, the research on community detection for multilayer networks is just in its infancy. At the time of this writing, there is still no standard algorithms for general multilayer networks and quite a few problems remain to be solved, such as the optimization of algorithm process to avoid time-consuming procedures, the extending of algorithms for applying in general form of multilayer networks, the simplification of mathematical model, and so on.

4 Applications

The study of detecting community structures in multilayer networks is experiencing a blossom in the last decade. Relevant researches cover various aspects among our daily life such as analyzing influential users in multiple social platforms (Al-Garadi et al. 2018), finding organization of proteins in a biological system (Gosak et al. 2018) and managing urban transportation system with various traffic manners (Liu et al. 2019), etc. The following subsections summarized applications of community detection via a multilayer network framework.

4.1 Temporal networks partition

Community detection in temporal networks, i.e., temporal community detection, is required to find how communities emerge, grow, combine, and decay in an evolving process (Kawadia and Sreenivasan 2012). A common approach to detect temporal communities is to obtain communities independently in each snapshot by utilizing static methods and then map the partitions between two snapshots together as many as possible. Obviously, it fails to achieve the goal of revealing the evolving process because such methods do not adequately use partitions found in past snapshots to inform the identification for the optimal partition on the current snapshot (Jiao et al. 2017). Thus, as the foremost step of modeling, the traditional graph model is incapable of presenting inter-connections in a temporal network.

Multilayer network model is commonly employed in the study of time-varying networks (or temporal networks, multi-slice networks), in which the time snapshots are modeled as layers and the layers are ordered by a certain sequence. However, there’s a foundational question: Across how many layers must a community persist in order for layer aggregation to benefit detection? To solve this problem, a layer aggregation approach (De Domenico et al. 2014) is proposed to reduce data size or as a data filter to benefit network-analysis outcomes. Since Mucha et al. (2010) introduced the multiplex modularity optimization method, numerous attempts were made in this field (Drugan et al. 2011; Nguyen et al. 2011; Li and Garcia-Luna-Aceves 2013), which opened up a upsurge in unveiling the communities in time-varying networks. Taylor et al. (2017) proposed the random matrix theory and found layer aggregation to significantly influence detectability. The detectability limitation is described as the ability of network structure to form a community, i.e., if the community structure is too weak, it cannot be found upon inspection of the network (Lancichinetti and Fortunato 2011). When the aggregative network corresponds to the summation of the adjacency matrices encoding the network layers, aggregation always improves detectability. The research is beneficial to understand the contraction of network layers and analyze pairwise-interaction data to obtain sparse network representations. The application of layer aggregation can be used for anomaly detection in network data, e.g., in cybersecurity, detecting harmful events such as attacks, intrusions, and fraud.

4.2 Transportation networks optimization

On account of the critical role of transportation system in modern society, the study on traffic dynamics has become one of the most successful applications of complex network theroy. However, the vast majority of researches treat transportation networks as an isolated system, which is inconsistent with the fact that many complex networks are interrelated in a nontrivial way (Du et al. 2016). Analogously, the transportation system has a variety of traffic manners, such as bus, subway, tram, high-speed train, airline, ship, etc, hence a comprehensive study should cover many of such manners. Early researches of traffic networks mainly focus on a single traffic way and ignore the interactions between their counterparts (Calimente 2012; Chen et al. 2014). Du et al. (2016) utilized a two-layered traffic network to study the distribution-based strategy and improved the generating rate of passengers using a particle swarm optimization algorithm. The multilayer network model utilized in this work is an idealized transportation system, in which each layer has a different topology and supports different traveling speeds. The passengers are allowed to travel along the path of minimal traveling time and with the additional cost they can transfer from one layer to another to avoid congestion. The research indicates that a degree centrality-based strategy is not overly beneficial in enhancing the performance of the system. However, starting from such a strategy and reassigning transfer costs using a particle swarm optimization algorithm improve the capacity and several other properties of the system at a reasonable computational cost. The research is rewarding to the selection of traffic manners and exemplifies how multilayer network models are applied in the urban transportation system.

Inspired by the complex network theory and the multilayer network representation, Hong and Liang (2016) analyze the Chinese airline transportation system with the multilayer network framework, in which each layer is defined by a commercial airline (company) and the weights of links are set by the number of flights, the number of seats and the geographical distance between pairs of airports, respectively. By calculating the clustering coefficient, average shortest path length, and assortativity coefficient of the airports, the research has shown that the Chinese airline is of considerably higher value of a maximal degree and betweenness than the other top airlines. Ding et al. (2018) proposed a method for measurements in areas of Kuala Lumpur (i.e., the national capital of Malaysia) to detect communities. The multilayer network model employed in their research contains the railway layer and urban street layer, which mainly focuses on detecting the changing structures of a rail network and mining in urban network communities. The experimental results suggest that rail network growth triggers structural and community changes, i.e., when an upper-layer rail network grows from a simple tree-like network to a more intricate form, the network diameter and average shortest path length decrease dramatically. The growth of the network allows the remainder of the network to be easily visited, which provides suggestive patterns for city development. Yildirimoglu and Kim (2018) analyzed the urban traffic network by combing bus lines, passenger trajectories, and vehicle trajectories together and formed a three-layered network. By applying the Louvain algorithm (Blondel et al. 2008) independently on the three layers, they found that aggerated patterns can shape geographically well-connected communities in the urban traffic network. The spatial structure is quite alike for the bus and passenger layers, which benefits transit authority in making location decisions. The research is beneficial from a planning perspective that sub-regional borders designate the influential areas around local centers, shopping districts, school zones, etc., and cities can develop policies in order to improve the accessibility to them and enhance network performance.

4.3 Social network analysis

Another hot-point of community detection research in multilayer networks is social network analysis (Alhajj and Rokne 2014). Social networks have been studied fairly extensively over the last couple of decades, mainly in the general context of analyzing interactions between people in order to determine important structural patterns in such interactions. With the utilization of plentiful data resources from online social media such as Facebook, Twitter, and Flickr, there’s an increasing tendency in discovering community structures in such time-varying social networks (Alimadadi et al. 2019; Rozario et al. 2019; Zhou et al. 2007, 2016). The emergence of online social networks has altered millions of web users’ behavior so that their interactions with each other produce huge amounts of data on various activities. Facebook and Twitter, as the top-two popular social media in our daily life, have been widely employed for social network analysis in recent years (Alimadadi et al. 2019; Türker and Sulak 2018).

The analysis of social networks is usually accompanied by various applications such as information propagation, internal trades analysis, influential spreaders identification, and so on. Diffusion processes, like the propagation of information or the spreading of diseases, are fundamental phenomena occurring in social networks. While the study of diffusion processes in single networks has received a great deal of interest from various disciplines for over a decade, diffusion on multilayer networks is still a young and promising research area, presenting many challenging research issues (Salehi et al. 2015). Numerous attempts have been made to uncover the community structures in international trades, typically represented as bipartite networks in which connections can be established between countries and industries (Alves et al. 2019). Biondo et al. (2017) present a multilayer network model with contagion dynamics, which is able to simulate the spreading of information and the transactions phase of a typical financial market. In their two-layered network framework, the first layer comprises the trading decisions of investors, and the second layer is constructed of the information dynamics, which is fruitfully beneficial to explain the aggregate behavior of markets. Basaras et al. (2017) proposed an effective method to detect influential spreaders in multilayer networks based on the underlying community structures. The experimental evaluation shows that the proposed method outperforms the major competitors proposed so far for either single-layer or multilayer networks.

The above-mentioned applications mainly focus on a local structure or some certain context-based community detection for the case of large volume and various dynamic changes of networks. Thus, it is of great significance in designing some smart algorithms to mine the valuable information among plentiful social network resources.

4.4 Research on biological systems

Biological systems, from a cell to the human brain, are intrinsically complex (Ma’ayan 2017). Multilayer networks, described by an intricate network of relationships across multiple scales, are most widely employed in representing such systems. The majority of the biological processes are constituted by a group of proteins that are connected densely (Cui et al. 2012). The protein-protein interaction (PPI) network contains the communications among the protein groups that communicate with each other closely, which can be used to predict the complexity of the function of normal proteins (Srihari et al. 2017). In general, there are two typical protein communities: protein complexes and protein functional modules. Protein complexes are sets of proteins that interact with each other to execute a single multimolecular mechanism. Protein functional modules are sets of proteins that participate in a particular biological process, and interact with each other at different time and places (Spirin and Mirny 2003). Recently, several studies are highlighting how simple networks, i.e., obtained by aggregating or neglecting temporal or categorical descriptions of biological data, are not able to account for the richness of information characterizing biological systems (De Domenico 2018). Chen et al. (2018a) proposed an MLPCD algorithm by integrating Gene Expression Data (GED) and a parallel solution of MLPCD using cloud computing technology. They reconstructed the weighted protein-protein interaction (WPPI) network by combining PPI network and related GED, and then defined simplified modularity as the ratio of in-degrees and out-degrees of proteins in a community. By utilizing an improved Louvain algorithm (Blondel et al. 2008), they have achieved the goal of detecting protein complexes and protein function modules.

Although non-overlapping communities are more commonly studied in network neuroscience, a model of community structure that allows for overlapping networks offers a more realistic presentation of brain network organization (Wu et al. 2011). Taking overlapping communities into consideration, Zhang et al. (2018a) propose a central edge selection (CES) based community detection algorithms for PPI networks. Experimental results on three benchmark networks and two PPI networks indicate the excellent performance of the proposed CES algorithm. Kurmukov et al. (2017) propose a framework to compare both overlapping and non-overlapping community structures of brain networks within the machine learning settings. The performance of the proposed framework is verified in the task of classifying Alzheimer’s disease, mild cognitive impairment, and healthy participants. Pan et al. (2018) present an aggregation approach to detect communities in multilayer biological networks, which first constructs a consensus graph form multiple networks and then applies traditional algorithms to detect communities. Inspired by the fact of few shared edges existed among different networks, they merge the weights of edges from different layers and cut off the nodes with low weights. The approach is simple but limited by the application scenarios.

Another notable direction of biological research is about human brain networks. Cantini et al. (2015) propose a multi-network-based strategy to integrate different layers of genomic information and use them in a coordinated way to identify diving cancer genes. The multi-networks they focus, combine transcription factor co-targeting, microRNA, cotargeting, protein-protein interaction, and gene co-expression networks. The combination of different layers benefits extracting from the multi-networks indications on the regulatory pattern and functional role of both the already known and the new candidate diver genes. Sanchez-Rodriguez et al. (2019) introduce an approach for the detection of a modular organization by considering the temporal scales of the information flow over large-scale brain graphs, and several organizational patterns existing in the brain anatomical and functional networks are found. The structures may coexist together, in a dynamical way that is given by the temporal scales of the activity they produce, guaranteeing functional independence and coordination.

In brief, discovering the underlying patterns in biological networks is experiencing a blossom. With the development of network science, multi-biological networks provide plentiful data resources than ever before, which requires us to dedicate more to this promising field.

5 Outlook

As interdisciplinary research with a variety of prospective applications, complex network has been receiving increasing attention from the scientific community. Inspired by prosperous real-world scenarios such as social networks, biological networks, and transportation networks, extensive researches have been dedicated to the extraction of non-trivial knowledge from such networks. Along with the further study, scholars come to realize many systems are inherently represented by a multilayer network, in which edges exist in multiple layers that encode differently but potentially related, types of interactions, and it is important to uncover the interlayer community structures in a complex system.

This paper first presents the various formats of multilayer networks and then introduces the two basic mathematical models. Subsequently, the quality evaluation measures and several typical community detection algorithms are introduced, including label propagation-based algorithm, nonnegative matrix factorization, random walk methods, and multi-objective optimization methods, and so on. After a comprehensive analysis of the above-mentioned methods, we conclude that most of the existing methods are designed for multiplex networks, i.e., the nodes in each layer are aligned, which limits the research on universal multilayer network format. Besides, the algorithms are with high computational complexity and can hardly obtain reasonable partitions among large-scale multilayer networks. A great deal of works remain to be done in the future, such as designing more efficient algorithms for temporal networks with numerous layers and exploring the community structures in special formats of multilayer networks.