• arXiv.cs.SI Pub Date : 2020-01-18
Mengyuan Chen; Jiang Zhang; Zhang Zhang; Lun Du; Qiao Hu; Shuo Wang; Jiaqi Zhu

Network structures in various backgrounds play important roles in social, technological, and biological systems. However, the observable network structures in real cases are often incomplete or unavailable due to measurement errors or private protection issues. Therefore, inferring the complete network structure is useful for understanding complex systems. The existing studies have not fully solved the problem of inferring network structure with partial or no information about connections or nodes. In this paper, we tackle the problem by utilizing time series data generated by network dynamics. We regard the network inference problem based on dynamical time series data as a problem of minimizing errors for predicting future states and proposed a novel data-driven deep learning model called Gumbel Graph Network (GGN) to solve the two kinds of network inference problems: Network Reconstruction and Network Completion. For the network reconstruction problem, the GGN framework includes two modules: the dynamics learner and the network generator. For the network completion problem, GGN adds a new module called the States Learner to infer missing parts of the network. We carried out experiments on discrete and continuous time series data. The experiments show that our method can reconstruct up to 100% network structure on the network reconstruction task. While the model can also infer the unknown parts of the structure with up to 90% accuracy when some nodes are missing. And the accuracy decays with the increase of the fractions of missing nodes. Our framework may have wide application areas where the network structure is hard to obtained and the time series data is rich.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
Sohini Roy; Harish Chandrasekaran; Anamitra Pal; Arunabha Sen

The reliable and resilient operation of the smart grid necessitates a clear understanding of the intra-and-inter dependencies of its power and communication systems. This understanding can only be achieved by accurately depicting the interactions between the different components of these two systems. This paper presents a model, called modified implicative interdependency model (MIIM), for capturing these interactions. Data obtained from a power utility in the U.S. Southwest is used to ensure the validity of the model. The performance of the model for a specific power system application namely, state estimation, is demonstrated using the IEEE 118-bus system. The results indicate that the proposed model is more accurate than its predecessor, the implicative interdependency model (IIM) [1], in predicting the system state in case of failures in the power and/or communication systems.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
Xinxun Zeng; Shiqi Zhang; Bo Tang

Influence Maximization Problem (IMP) is selecting a seed set of nodes in the social network to spread the influence as widely as possible. It has many applications in multiple domains, e.g., viral marketing is frequently used for new products or activities advertisements. While it is a classic and well-studied problem in computer science, unfortunately, all those proposed techniques are compromising among time efficiency, memory consumption, and result quality. In this paper, we conduct comprehensive experimental studies on the state-of-the-art IMP approximate approaches to reveal the underlying trade-off strategies. Interestingly, we find that even the state-of-the-art approaches are impractical when the propagation probability of the network have been taken into consideration. With the findings of existing approaches, we propose a novel residual-based approach (i.e., RCELF) for IMP, which i) overcomes the deficiencies of existing approximate approaches, and ii) provides theoretical guaranteed results with high efficiency in both time- and space- perspectives. We demonstrate the superiority of our proposal by extensive experimental evaluation on real datasets.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
Kangfei Zhao; Yu Rong; Jeffrey Xu Yu; Junzhou Huang; Hao Zhang

Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, link prediction, and community detection. These models are usually designed to preserve the vertex information at different granularity and reduce the problems in discrete space to some machine learning tasks in continuous space. However, regardless of the fruitful progress, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. Moreover, these problems are closely related to reformulating a global layout for a specific graph, which is an important NP-hard combinatorial optimization problem: graph ordering. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach. Distinguished from greedy algorithms based on predefined heuristics, we propose a neural network model: Deep Order Network (DON) to capture the hidden locality structure from partial vertex order sets. Supervised by sampled partial order, DON has the ability to infer unseen combinations. Furthermore, to alleviate the combinatorial explosion in the training space of DON and make the efficient partial vertex order sampling , we employ a reinforcement learning model: the Policy Network, to adjust the partial order sampling probabilities during the training phase of DON automatically. To this end, the Policy Network can improve the training efficiency and guide DON to evolve towards a more effective model automatically. Comprehensive experiments on both synthetic and real data validate that DON-RL outperforms the current state-of-the-art heuristic algorithm consistently. Two case studies on graph compression and edge partitioning demonstrate the potential power of DON-RL in real applications.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
Yuhui Zhao; Ning Yang; Tao Lin; Philip S. Yu

Recently, information cascade prediction has attracted increasing interest from researchers, but it is far from being well solved partly due to the three defects of the existing works. First, the existing works often assume an underlying information diffusion model, which is impractical in real world due to the complexity of information diffusion. Second, the existing works often ignore the prediction of the infection order, which also plays an important role in social network analysis. At last, the existing works often depend on the requirement of underlying diffusion networks which are likely unobservable in practice. In this paper, we aim at the prediction of both node infection and infection order without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network, where the challenges are two-fold. The first is what cascading characteristics of nodes should be captured and how to capture them, and the second is that how to model the non-linear features of nodes in information cascades. To address these challenges, we propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction, which can capture not only the node structural property but also two kinds of node cascading characteristics. We propose an auto-encoder based collaborative embedding framework to learn the node embeddings with cascade collaboration and node collaboration, in which way the non-linearity of information cascades can be effectively captured. The results of extensive experiments conducted on real-world datasets verify the effectiveness of our approach.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
Maria Óskarsdóttir; Cristián Bravo; Wouter Verbeke; Carlos Sarraute; Bart Baesens; Jan Vanthienen

Relational learning in networked data has been shown to be effective in a number of studies. Relational learners, composed of relational classifiers and collective inference methods, enable the inference of nodes in a network given the existence and strength of links to other nodes. These methods have been adapted to predict customer churn in telecommunication companies showing that incorporating them may give more accurate predictions. In this research, the performance of a variety of relational learners is compared by applying them to a number of CDR datasets originating from the telecommunication industry, with the goal to rank them as a whole and investigate the effects of relational classifiers and collective inference methods separately. Our results show that collective inference methods do not improve the performance of relational classifiers and the best performing relational classifier is the network-only link-based classifier, which builds a logistic model using link-based measures for the nodes in the network.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
María Óskarsdóttir; Cristián Bravo; Wouter Verbeke; Carlos Sarraute; Bart Baesens; Jan Vanthienen

Social network analytics methods are being used in the telecommunication industry to predict customer churn with great success. In particular it has been shown that relational learners adapted to this specific problem enhance the performance of predictive models. In the current study we benchmark different strategies for constructing a relational learner by applying them to a total of eight distinct call-detail record datasets, originating from telecommunication organizations across the world. We statistically evaluate the effect of relational classifiers and collective inference methods on the predictive power of relational learners, as well as the performance of models where relational learners are combined with traditional methods of predicting customer churn in the telecommunication industry. Finally we investigate the effect of network construction on model performance; our findings imply that the definition of edges and weights in the network does have an impact on the results of the predictive models. As a result of the study, the best configuration is a non-relational learner enriched with network variables, without collective inference, using binary weights and undirected networks. In addition, we provide guidelines on how to apply social networks analytics for churn prediction in the telecommunication industry in an optimal way, ranging from network architecture to model building and evaluation.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-18
Chunheng Jiang; Jianxi Gao; Malik Magdon-Ismail

We study nonlinear dynamics on complex networks. Each vertex $i$ has a state $x_i$ which evolves according to a networked dynamics to a steady-state $x_i^*$. We develop fundamental tools to learn the true steady-state of a small part of the network, without knowing the full network. A naive approach and the current state-of-the-art is to follow the dynamics of the observed partial network to local equilibrium. This dramatically fails to extract the true steady state. We use a mean-field approach to map the dynamics of the unseen part of the network to a single node, which allows us to recover accurate estimates of steady-state on as few as 5 observed vertices in domains ranging from ecology to social networks to gene regulation. Incomplete networks are the norm in practice, and we offer new ways to think about nonlinear dynamics when only sparse information is available.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-19

Clique counting is a fundamental task in network analysis, and even the simplest setting of $3$-cliques (triangles) has been the center of much recent research. Getting the count of $k$-cliques for larger $k$ is algorithmically challenging, due to the exponential blowup in the search space of large cliques. But a number of recent applications (especially for community detection or clustering) use larger clique counts. Moreover, one often desires \textit{local} counts, the number of $k$-cliques per vertex/edge. Our main result is Pivoter, an algorithm that exactly counts the number of $k$-cliques, \textit{for all values of $k$}. It is surprisingly effective in practice, and is able to get clique counts of graphs that were beyond the reach of previous work. For example, Pivoter gets all clique counts in a social network with a 100M edges within two hours on a commodity machine. Previous parallel algorithms do not terminate in days. Pivoter can also feasibly get local per-vertex and per-edge $k$-clique counts (for all $k$) for many public data sets with tens of millions of edges. To the best of our knowledge, this is the first algorithm that achieves such results. The main insight is the construction of a Succinct Clique Tree (SCT) that stores a compressed unique representation of all cliques in an input graph. It is built using a technique called \textit{pivoting}, a classic approach by Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for maximal cliques. Remarkably, the SCT can be built without actually enumerating all cliques, and provides a succinct data structure from which exact clique statistics ($k$-clique counts, local counts) can be read off efficiently.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-19
Meysam Asgari-Chenaghlu; M. Reza Feizi-Derakhshi; Leili Farzinvash; Cina Motamed

Named Entity Recognition (NER) from social media posts is a challenging task. User generated content which forms the nature of social media, is noisy and contains grammatical and linguistic errors. This noisy content makes it much harder for tasks such as named entity recognition. However some applications like automatic journalism or information retrieval from social media, require more information about entities mentioned in groups of social media posts. Conventional methods applied to structured and well typed documents provide acceptable results while compared to new user generated media, these methods are not satisfactory. One valuable piece of information about an entity is the related image to the text. Combining this multimodal data reduces ambiguity and provides wider information about the entities mentioned. In order to address this issue, we propose a novel deep learning approach utilizing multimodal deep learning. Our solution is able to provide more accurate results on named entity recognition task. Experimental results, namely the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-19
Thiago C. Silva; Diego R. Amancio; Benjamin M. Tabak

We study a novel economic network comprised of wire transfers (electronic payment transactions) among the universe of firms in Brazil (6.2 million firms). We construct a directed and weighted network in which vertices represent cities and edges connote pairwise economic dependence between cities. Each city (vertex) represents the collection of all firms within that city. Edge weights are modeled by the total amount of wire transfers that arise due to business transactions between firms localized at different cities. The rationale is that the more they transact with each other, the more dependent they become in the economic sense. We find a high degree of economic integration among cities in the trade network, which is consistent with the high degree of specialization found across Brazilian cities. We are able to identify which cities have a dominant role in the entire supply chain process using centrality network measures. We find that the trade network has a disassortative mixing pattern, which is consistent with the power-law shape of the firm size distribution in Brazil. After the Brazilian recession in 2014, we find that the disassortativity becomes even stronger as a result of the death of many small firms and the consequent concentration of economic flows on large firms. Our results suggest that recessions have a large impact on the trade network with meaningful and heterogeneous economic consequences across municipalities.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-20
Jeremy Kepner; Tim Davis; Chansup Byun; William Arcand; David Bestor; William Bergeron; Vijay Gadepally; Matthew Hubbell; Michael Houle; Michael Jones; Anna Klein; Peter Michaleas; Lauren Milechin; Julie Mullen; Andrew Prout; Antonio Rosa; Siddharth Samsi; Charles Yee; Albert Reuther

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-20
Dany Kamuhanda; Meng Wang; Kun He

Local community detection consists of finding a group of nodes closely related to the seeds, a small set of nodes of interest. Such group of nodes are densely connected or have a high probability of being connected internally than their connections to other clusters in the network. Existing local community detection methods focus on finding either one local community that all seeds are most likely to be in or finding a single community for each of the seeds. However, a seed member may belong to multiple local overlapping communities. In this work, we present a novel method of detecting multiple local communities to which a single seed member belongs. The proposed method consists of three key steps: (1) local sampling with Personalized PageRank (PPR); (2) using the sparseness generated by a sparse nonnegative matrix factorization (SNMF) to estimate the number of communities in the sampled subgraph; (3) using SNMF soft community membership vectors to assign nodes to communities. The proposed method shows favorable accuracy performance when compared to state-of-the-art community detection methods by experiments using a combination of artificial and real-world networks.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-20
Simiao Jiao; Zihui Xue; Xiaowei Chen; Yuedong Xu

Graphlets are induced subgraph patterns that are crucial to the understanding of the structure and function of a large network. A lot of efforts have been devoted to calculating graphlet statistics where random walk based approaches are commonly used to access restricted graphs through the available application programming interfaces (APIs). However, most of them merely consider individual networks while overlooking the strong coupling between different networks. In this paper, we estimate the graphlet concentration in multi-layer networks with real-world applications. An inter-layer edge connects two nodes in different layers if they belong to the same person. The access to a multi-layer network is restrictive in the sense that the upper layer allows random walk sampling, whereas the nodes of lower layers can be accessed only though the inter-layer edges and only support random node or edge sampling. To cope with this new challenge, we define a suit of two-layer graphlets and propose a novel random walk sampling algorithm to estimate the proportion of all the 3-node graphlets. An analytical bound on the sampling steps is proved to guarantee the convergence of our unbiased estimator. We further generalize our algorithm to explore the tradeoff between the estimated accuracies of different graphlets when the sample size is split on different layers. Experimental evaluation on real-world and synthetic multi-layer networks demonstrate the accuracy and high efficiency of our unbiased estimators.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-21
Takafumi J. Suzuki

Document networks are found in various collections of real-world data, such as citation networks, hyperlinked web pages, and online social networks. A large number of generative models have been proposed because they offer intuitive and useful pictures for analyzing document networks. Prominent examples are relational topic models, where documents are linked according to their topic similarities. However, existing generative models do not make full use of network structures because they are largely dependent on topic modeling of documents. In particular, centrality of graph nodes is missing in generative processes of previous models. In this paper, we propose a novel generative model for document networks by introducing random walkers on networks to integrate the node centrality into link generation processes. The developed method is evaluated in semi-supervised classification tasks with real-world citation networks. We show that the proposed model outperforms existing probabilistic approaches especially in detecting communities in connected networks.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-21
Benedek Rozemberczki; Rik Sarkar

A graph embedding is a representation of graph vertices in a low-dimensional space, which approximately preserves properties such as distances between nodes. Vertex sequence-based embedding procedures use features extracted from linear sequences of nodes to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequence-based embedding methods.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-21
Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn

This paper presents a dataset with over 3.3M threads and 134.5M posts from the Politically Incorrect board (/pol/) of the imageboard forum 4chan, posted over a period of almost 3.5 years (June 2016-November 2019). To the best of our knowledge, this represents the largest publicly available 4chan dataset, providing the community with an archive of posts that have been permanently deleted from 4chan and are otherwise inaccessible. We augment the data with a few set of additional labels, including toxicity scores and the named entities mentioned in each post. We also present a statistical analysis of the dataset, providing an overview of what researchers interested in using it can expect, as well as a simple content analysis, shedding light on the most prominent discussion topics, the most popular entities mentioned, and the level of toxicity in each post. Overall, we are confident that our work will further motivate and assist researchers in studying and understanding 4chan as well as its role on the greater Web. For instance, we hope this dataset may be used for cross-platform studies of social media, as well as being useful for other types of research like natural language processing. Finally, our dataset can assist qualitative work focusing on in-depth case studies of specific narratives, events, or social theories.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2018-01-06
Krishna Dasaratha; Benjamin Golub; Nir Hak

Agents learn about a changing state using private signals and past actions of neighbors in a network. We characterize equilibrium learning and social influence in this setting. We then examine when agents can aggregate information well, responding quickly to recent changes. A key sufficient condition for good aggregation is that each individual's neighbors have sufficiently different types of private information. In contrast, when signals are homogeneous, aggregation is suboptimal on any network. We also examine behavioral versions of the model, and show that achieving good aggregation requires a sophisticated understanding of correlations in neighbors' actions. The model provides a Bayesian foundation for a tractable learning dynamic in networks, closely related to the DeGroot model, and offers new tools for counterfactual and welfare analyses.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2018-08-26
Orowa Sikder; Robert E. Smith; Pierpaolo Vivo; Giacomo Livan

Online social networks provide users with unprecedented opportunities to engage with diverse opinions. At the same time, they enable confirmation bias on large scales by empowering individuals to self-select narratives they want to be exposed to. A precise understanding of such tradeoffs is still largely missing. We introduce a social learning model where most participants in a network update their beliefs unbiasedly based on new information, while a minority of participants reject information that is incongruent with their preexisting beliefs. This simple mechanism generates permanent opinion polarization and cascade dynamics, and accounts for the aforementioned tradeoff between confirmation bias and social connectivity through analytic results. We investigate the model's predictions empirically using US county-level data on the impact of Internet access on the formation of beliefs about global warming. We conclude by discussing policy implications of our model, highlighting the downsides of debunking and suggesting alternative strategies to contrast misinformation.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2019-06-04
Samin Aref; Zachary Neal

We propose new mathematical programming models for optimal partitioning of a signed graph into cohesive groups. To demonstrate the approach's utility, we apply it to identify coalitions in US Congress since 1979 and examine the impact of polarized coalitions on the effectiveness of passing bills. Our models produce a globally optimal solution to the NP-hard problem of minimizing the total number of intra-group negative and inter-group positive edges. We tackle the intensive computations of dense signed networks by providing upper and lower bounds, then solving an optimization model which closes the gap between the two bounds and returns the optimal partitioning of vertices. Our substantive findings suggest that the dominance of an ideologically homogeneous coalition (i.e. partisan polarization) can be a protective factor that enhances legislative effectiveness.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2019-10-04
Bo Wu; Wen-Huang Cheng; Peiye Liu; Bei Liu; Zhaoyang Zeng; Jiebo Luo

"SMP Challenge" aims to discover novel prediction tasks for numerous data on social multimedia and seek excellent research teams. Making predictions via social multimedia data (e.g. photos, videos or news) is not only helps us to make better strategic decisions for the future, but also explores advanced predictive learning and analytic methods on various problems and scenarios, such as multimedia recommendation, advertising system, fashion analysis etc. In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.) of new online posts in social media feeds before uploading. We also collected and released a large-scale SMPD benchmark with over 480K posts from 69K users. In this paper, we define the challenge problem, give an overview of the dataset, present statistics of rich information for data and annotation and design the accuracy and correlation evaluation metrics for temporal popularity prediction to the challenge.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2019-12-28
Yaqing Wang; Weifeng Yang; Fenglong Ma; Jin Xu; Bin Zhong; Qiang Deng; Jing Gao

Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake news as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which can leverage users' reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users' reports. The reinforced selector using reinforcement learning techniques chooses high-quality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detector's prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed WeFEND model achieves the best performance compared with the state-of-the-art methods.

更新日期：2020-01-22
• arXiv.cs.SI Pub Date : 2020-01-16
Diogo Pacheco; Pik-Mai Hui; Christopher Torres-Lugo; Bao Tran Truong; Alessandro Flammini; Filippo Menczer

Coordinated campaigns are used to influence and manipulate social media platforms and their users, a critical challenge to the free exchange of information online. Here we introduce a general network-based framework to uncover groups of accounts that are likely coordinated. The proposed method construct coordination networks based on arbitrary behavioral traces shared among accounts. We present five case studies of influence campaigns in the diverse contexts of U.S. elections, Hong Kong protests, the Syrian civil war, and cryptocurrencies. In each of these cases, we detect networks of coordinated Twitter accounts by examining their identities, images, hashtag sequences, retweets, and temporal patterns. The proposed framework proves to be broadly applicable to uncover different kinds of coordination across information warfare scenarios.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16

Online social networks (OSNs) are ubiquitous attracting millions of users all over the world. Being a popular communication media OSNs are exploited in a variety of cyber attacks. In this article, we discuss the Chameleon attack technique, a new type of OSN-based trickery where malicious posts and profiles change the way they are displayed to OSN users to conceal themselves before the attack or avoid detection. Using this technique, adversaries can, for example, avoid censorship by concealing true content when it is about to be inspected; acquire social capital to promote new content while piggybacking a trending one; cause embarrassment and serious reputation damage by tricking a victim to like, retweet, or comment a message that he wouldn't normally do without any indication for the trickery within the OSN. An experiment performed with closed Facebook groups of sports fans shows that (1) Chameleon pages can pass by the moderation filters by changing the way their posts are displayed and (2) moderators do not distinguish between regular and Chameleon pages. We list the OSN weaknesses that facilitate the Chameleon attack and propose a set of mitigation guidelines.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16
Jialu Bao; Kun He; Xiaodong Xin; Bart Selman; John E. Hopcropt

Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection). HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide no theoretical guarantee for the performance. In this work, we focus theoretical analysis of HICODE on synthetic two-layer networks, where layers are independent to each other and each layer is generated by stochastic block model. We bridge their gap through two-layer stochastic block model networks in the following aspects: 1) we show that partitions that locally optimize modularity correspond to layers, indicating modularity-optimizing algorithms can detect strong layers; 2) we prove that when reducing found layers, HICODE increases absolute modularities of all unreduced layers, showing its layer reduction step makes weak layers more detectable. Our work builds a solid theoretical base for HICODE, demonstrating that it is promising in uncovering both weak and strong layers of communities in two-layer networks.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16
Joaquín J. Torres; Ginestra Bianconi

Simplicial complexes constitute the underlying topology of interacting complex systems including among the others brain and social interaction networks. They are generalized network structures that allow to go beyond the framework of pairwise interactions and to capture the many-body interactions between two or more nodes strongly affecting dynamical processes. In fact, the simplicial complexes topology allows to assign a dynamical variable not only to the nodes of the interacting complex systems but also to links, triangles, and so on. Here we show evidence that the dynamics defined on simplices of different dimensions can be significantly different even if we compare dynamics of simplices belonging to the same simplicial complex. By investigating the spectral properties of the simplicial complex model called "Network Geometry with Flavor" we provide evidence that the up and down higher-order Laplacians can have a finite spectral dimension whose value increases as the order of the Laplacian increases. Finally we discuss the implications of this result for higher-order diffusion defined on simplicial complexes.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16
Matúš Medo; Manuel S. Mariani; Linyuan Lü

Online news can quickly reach and affect millions of people, yet little is known about potential dynamical regularities that govern their impact on the public. By analyzing data collected from two nation-wide news outlets, we demonstrate that the impact dynamics of online news articles does not exhibit popularity patterns found in many other social and information systems. In particular, we find that the news comment count follows a universal exponential distribution which is explained by the lack of the otherwise omnipresent rich-get-richer mechanism. Exponential aging induces a universal dynamics of article impact. We finally find that the readers' collective attention does "stretch" in the presence of high-impact articles, thus effectively canceling possible competition among the articles. Our findings challenge the generality of widespread popularity dynamics patterns as well as common assumptions of attention economy, suggesting the need to critically reconsider the assumption that collective attention is inherently limited.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16
Viet Duong; Phu Pham; Ritwik Bose; Jiebo Luo

Recently, the emergence of the #MeToo trend on social media has empowered thousands of people to share their own sexual harassment experiences. This viral trend, in conjunction with the massive personal information and content available on Twitter, presents a promising opportunity to extract data driven insights to complement the ongoing survey based studies about sexual harassment in college. In this paper, we analyze the influence of the #MeToo trend on a pool of college followers. The results show that the majority of topics embedded in those #MeToo tweets detail sexual harassment stories, and there exists a significant correlation between the prevalence of this trend and official reports on several major geographical regions. Furthermore, we discover the outstanding sentiments of the #MeToo tweets using deep semantic meaning representations and their implications on the affected users experiencing different types of sexual harassment. We hope this study can raise further awareness regarding sexual misconduct in academia.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16
Ryan Rossi; Somdeb Sarkhel; David Arbour; Nesreen Ahmed

In this work, we formalize the problem of causal inference over graph-based relational time-series data where each node in the graph has one or more time-series associated to it. We propose causal inference models for this problem that leverage both the graph topology and time-series to accurately estimate local causal effects of nodes. Furthermore, the relational time-series causal inference models are able to estimate local effects for individual nodes by exploiting local node-centric temporal dependencies and topological/structural dependencies. We show that simpler causal models that do not consider the graph topology are recovered as special cases of the proposed relational time-series causal inference model. We describe the conditions under which the resulting estimate can be used to estimate a causal effect, and describe how the Durbin-Wu-Hausman test of specification can be used to test for the consistency of the proposed estimator from data. Empirically, we demonstrate the effectiveness of the causal inference models on both synthetic data with known ground-truth and a large-scale observational relational time-series data set collected from Wikipedia.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-16
Mahdi Bohlouli; Jens Dalter; Mareike Dornhöfer; Johannes Zenkert; Madjid Fathi

In todays competitive business world, being aware of customer needs and market-oriented production is a key success factor for industries. To this aim, the use of efficient analytic algorithms ensures a better understanding of customer feedback and improves the next generation of products. Accordingly, the dramatic increase in using social media in daily life provides beneficial sources for market analytics. But how traditional analytic algorithms and methods can scale up for such disparate and multi-structured data sources is the main challenge in this regard. This paper presents and discusses the technological and scientific focus of the SoMABiT as a social media analysis platform using big data technology. Sentiment analysis has been employed in order to discover knowledge from social media. The use of MapReduce and developing a distributed algorithm towards an integrated platform that can scale for any data volume and provide a social media-driven knowledge is the main novelty of the proposed concept in comparison to the state-of-the-art technologies.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2019-09-04
Bas Hofstra; Vivek V. Kulkarni; Sebastian Munoz-Najar Galvez; Bryan He; Dan Jurafsky; Daniel A. McFarland

Prior work finds a diversity paradox: diversity breeds innovation, and yet, underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-population of ~1.2 million US doctoral recipients from 1977-2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: e.g., novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity's role in innovation and partly explains the underrepresentation of some groups in academia.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2019-10-18
Feng Shi; James Evans

Breakthrough discoveries and inventions involve unexpected combinations of contents including problems, methods, and natural entities, and also diverse contexts such as journals, subfields, and conferences. Drawing on data from tens of millions of research papers, patents, and researchers, we construct models that predict next year's content and context combinations with an AUC of 95% based on embeddings constructed from high-dimensional stochastic block models, where the improbability of new combinations itself predicts up to 50% of the likelihood that they will gain outsized citations and major awards. Most of these breakthroughs occur when problems in one field are unexpectedly solved by researchers from a distant other. These findings demonstrate the critical role of surprise in advance, and enable evaluation of scientific institutions ranging from education and peer review to awards in supporting it.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2019-12-06
Shuqi Xu; Qianming Zhang; Linyuan Lv; Manuel Sebastian Mariani

Over the past decade, many startups have sprung up, which create a huge demand for financial support from venture investors. However, due to the information asymmetry between investors and companies, the financing process is usually challenging and time-consuming, especially for the startups that have not yet obtained any investment. Because of this, effective data-driven techniques to automatically match startups with potentially relevant investors would be highly desirable. Here, we analyze 34,469 valid investment events collected from www.itjuzi.com and consider the cold-start problem of recommending investors for new startups. We address this problem by constructing different tripartite network representations of the data where nodes represent investors, companies, and companies' domains. First, we find that investors have strong domain preferences when investing, which motivates us to introduce virtual links between investors and investment domains in the tripartite network construction. Our analysis of the recommendation performance of diffusion-based algorithms applied to various network representations indicates that prospective investors for new startups are effectively revealed by integrating network diffusion processes with investors' domain preference.

更新日期：2020-01-17
• arXiv.cs.SI Pub Date : 2020-01-15
Alessandro Balestrucci

Social Media are evolving as a pervasive source of news able to reach a larger audience through their spreading power. The main drawback is given by the presence of malicious accounts, known as social bots, which are often used to diffuse misleading information. Social bots are automated accounts whose goal is to interact with humans and influence them. Starting from the definition of credulous (i.e., human accounts with a high percentage of bot friends among their followees, in this work we aim to single out a regression model to derive, with an acceptable margin of error, the percentage of bot-followees of a human-operated account. The advantage lies in knowing, as a preventive measure, which users may be the target of bots' activities, hence more exposed to the misleading/unreliable content. Our results showed that the best regression model achieves a Mean Absolute Error of 3.62% and a Root Mean Squared Error of 5.96%, thus encouraging further research in this direction.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Jiajing Wu; Jieli Liu; Weili Chen; Huawei Huang; Zibin Zheng; Yan Zhang

As the first decentralized peer-to-peer (P2P) cryptocurrency system allowing people to trade with pseudonymous addresses, Bitcoin has become increasingly popular in recent years. However, the P2P and pseudonymous nature of Bitcoin make transactions on this platform very difficult to track, thus triggering the emergence of various illegal activities in the Bitcoin ecosystem. Particularly, \emph{mixing services} in Bitcoin, originally designed to enhance transaction anonymity, have been widely employed for money laundry to complicate trailing illicit fund. In this paper, we focus on the detection of the addresses belonging to mixing services, which is an important task for anti-money laundering in Bitcoin. Specifically, we provide a feature-based network analysis framework to identify statistical properties of mixing services from three levels, namely, network level, account level and transaction level. To better characterize the transaction patterns of different types of addresses, we propose the concept of Attributed Temporal Heterogeneous motifs (ATH motifs). Moreover, to deal with the issue of imperfect labeling, we tackle the mixing detection task as a Positive and Unlabeled learning (PU learning) problem and build a detection model by leveraging the considered features. Experiments on real Bitcoin datasets demonstrate the effectiveness of our detection model and the importance of hybrid motifs including ATH motifs in mixing detection.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Qianlan Bai; Chao Zhang; Yuedong Xu; Xiaowei Chen; Xin Wang

Ethereum is one of the most popular blockchain systems that supports more than half a million transactions every day and fosters miscellaneous decentralized applications with its Turing-complete smart contract machine. Whereas it remains mysterious what the transaction pattern of Ethereum is and how it evolves over time. In this paper, we study the evolutionary behavior of Ethereum transactions from a temporal graph point of view. We first develop a data analytics platform to collect external transactions associated with users as well as internal transactions initiated by smart contracts. Three types of temporal graphs, user-to-user, contract-to-contract and user-contract graphs, are constructed according to trading relationship and are segmented with an appropriate time window. We observe a strong correlation between the size of user-to-user transaction graph and the average Ether price in a time window, while no evidence of such linkage is shown at the average degree, average edge weights and average triplet closure duration. The macroscopic and microscopic burstiness of Ethereum transactions is validated. We analyze the Gini indexes of the transaction graphs and the user wealth in which Ethereum is found to be very unfair since the very beginning, in a sense, "the rich is already very rich".

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Reshef Meir; Gal Shahaf; Ehud Shapiro; Nimrod Talmon

Voting rules may fail to implement the will of the society when only some voters actively participate, and/or in the presence of sybil (fake or duplicate) voters. Here we aim to address social choice in the the presence of sybils and the absence of full participation. To do so we assume the status-quo (Reality) as an ever-present distinguished alternative, and study \emph{Reality Enforcing voting rules}, which add virtual votes in support of the status quo. We measure the tradeoff between safety and liveness (the ability of active honest voters to maintain/change the status quo, respectively) in a variety of domains, and show that Reality Enforcing voting is optimal.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Zhen Liu; Hu li; Chao Wang

Tie strength prediction, sometimes named weight prediction, is vital in exploring the diversity of connectivity pattern emerged in networks. Due to the fundamental significance, it has drawn much attention in the field of network analysis and mining. Some related works appeared in recent years have significantly advanced our understanding of how to predict the strong and weak ties in the social networks. However, most of the proposed approaches are scenario-aware methods heavily depending on some special contexts and even exclusively used in social networks. As a result, they are less applicable to various kinds of networks. In contrast to the prior studies, here we propose a new computational framework called Neighborhood Estimating Weight (NEW) which is purely driven by the basic structure information of the network and has the flexibility for adapting to diverse types of networks. In NEW, we design a novel index, i.e., connection inclination, to generate the representative features of the network, which is capable of capturing the actual distribution of the tie strength. In order to obtain the optimized prediction results, we also propose a parameterized regression model which approximately has a linear time complexity and thus is readily extended to the implementation in large-scale networks. The experimental results on six real-world networks demonstrate that our proposed predictive model outperforms the state of the art methods, which is powerful for predicting the missing tie strengths when only a part of the network's tie strength information is available.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Camille Roth; Antoine Mazières; Telmo Menezes

The role of recommendation algorithms in online user confinement is at the heart of a fast-growing literature. Recent empirical studies generally suggest that filter bubbles may principally be observed in the case of explicit recommendation (based on user-declared preferences) rather than implicit recommendation (based on user activity). We focus on YouTube which has become a major online content provider but where confinement has until now been little-studied in a systematic manner. Starting from a diverse number of seed videos, we first describe the properties of the sets of suggested videos in order to design a sound exploration protocol able to capture latent recommendation graphs recursively induced by these suggestions. These graphs form the background of potential user navigations along non-personalized recommendations. From there, be it in topological, topical or temporal terms, we show that the landscape of what we call mean-field YouTube recommendations is often prone to confinement dynamics. Moreover, the most confined recommendation graphs i.e., potential bubbles, seem to be organized around sets of videos that garner the highest audience and thus plausibly viewing time.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Shuqi Xu; Manuel Sebastian Mariani; Linyuan Lü; Matúš Medo

Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics' ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics' performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Peter M. AronowYale; Dean EcklesMIT; Cyrus SamiiNYU; Stephanie ZonszeinNYU

We present current methods for estimating treatment effects and spillover effects under "interference", a term which covers a broad class of situations in which a unit's outcome depends not only on treatments received by that unit, but also on treatments received by other units. To the extent that units react to each other, interact, or otherwise transmit effects of treatments, valid inference requires that we account for such interference, which is a departure from the traditional assumption that units' outcomes are affected only by their own treatment assignment. Interference and associated spillovers may be a nuisance or they may be of substantive interest to the researcher. In this chapter, we focus on interference in the context of randomized experiments. We review methods for when interference happens in a general network setting. We then consider the special case where interference is contained within a hierarchical structure. Finally, we discuss the relationship between interference and contagion. We use the interference R package and simulated data to illustrate key points. We consider efficient designs that allow for estimation of the treatment and spillover effects and discuss recent empirical studies that try to capture such effects.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Xupin Zhang; Hanjia Lyu; Jiebo Luo

Researchers have attempted to measure the success of crowdfunding campaigns using a variety of determinants, such as the descriptions of the crowdfunding campaigns, the amount of funding goals, and crowdfunding project characteristics. Although many success determinants have been reported in the literature, it remains unclear whether the cover photo and the text in the title and description could be combined in a fusion classifier to better predict the crowdfunding campaign's success. In this work, we focus on the performance of the crowdfunding campaigns on GoFundMe over a wide variety of funding categories. We analyze the attributes available at the launch of the campaign and identify attributes that are important for each category of the campaigns. Furthermore, we develop a fusion classifier based on random forest that significantly improves the prediction result, thus suggesting effective ways to make a campaign successful.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-15
Ronshee Chawla; Abishek Sankararaman; Ayalvadi Ganesh; Sanjay Shakkottai

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $\Omega(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2019-03-14
Piotr Bródka; Katarzyna Musial; Jarosław Jankowski

The world of network science is fascinating and filled with complex phenomena that we aspire to understand. One of them is the dynamics of spreading processes over complex networked structures. Building the knowledge-base in the field where we can face more than one spreading process propagating over a network that has more than one layer is a challenging task, as the complexity comes both from the environment in which the spread happens and from characteristics and interplay of spreads' propagation. As this cross-disciplinary field bringing together computer science, network science, biology and physics has rapidly grown over the last decade, there is a need to comprehensively review the current state-of-the-art and offer to the research community a roadmap that helps to organise the future research in this area. Thus, this survey is a first attempt to present the current landscape of the multi-processes spread over multilayer networks and to suggest the potential ways forward.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2019-08-28
Friso Selten; Cameron Neylon; Chun-Kai Huang; Paul Groth

Pressured by globalization and the increasing demand for public organisations to be accountable, efficient and transparent, university rankings have become an important tool for assessing the quality of higher education institutions. It is therefore important to carefully assess exactly what these rankings measure. In this paper, the three major global university rankings, The Academic Ranking of World Universities, The Times Higher Education and the Quacquarelli Symonds World University Rankings, are studied. After a description of the ranking methodologies, it is shown that university rankings are stable over time but that there is variation between the three rankings. Furthermore, using Principal Component Analysis and Exploratory Factor Analysis, we show that the variables used to construct the rankings primarily measure two underlying factors: a universities reputation and its research performance. By correlating these factors and plotting regional aggregates of universities on the two factors, differences between the rankings are made visible. Last, we elaborate how the results from these analysis can be viewed in light of often voiced critiques of the ranking process. This indicates that the variables used by the rankings might not capture the concepts they claim to measure. Doing so the study provides evidence of the ambiguous nature of university ranking's quantification of university performance.

更新日期：2020-01-16
• arXiv.cs.SI Pub Date : 2020-01-14
Hamidreza Alvari; Ghazaleh Beigi; Soumajyoti Sarkar; Scott W. Ruston; Steven R. Corman; Hasan Davulcu; Paulo Shakarian

Over the past few years, we have observed different media outlets' attempts to shift public opinion by framing information to support a narrative that facilitate their goals. Malicious users referred to as "pathogenic social media" (PSM) accounts are more likely to amplify this phenomena by spreading misinformation to viral proportions. Understanding the spread of misinformation from account-level perspective is thus a pressing problem. In this work, we aim to present a feature-driven approach to detect PSM accounts in social media. Inspired by the literature, we set out to assess PSMs from three broad perspectives: (1) user-related information (e.g., user activity, profile characteristics), (2) source-related information (i.e., information linked via URLs shared by users) and (3) content-related information (e.g., tweets characteristics). For the user-related information, we investigate malicious signals using causality analysis (i.e., if user is frequently a cause of viral cascades) and profile characteristics (e.g., number of followers, etc.). For the source-related information, we explore various malicious properties linked to URLs (e.g., URL address, content of the associated website, etc.). Finally, for the content-related information, we examine attributes (e.g., number of hashtags, suspicious hashtags, etc.) from tweets posted by users. Experiments on real-world Twitter data from different countries demonstrate the effectiveness of the proposed approach in identifying PSM users.

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2020-01-14
Pedro Cisneros-Velarde; Francesco Bullo

We propose a novel network formation game that explains the emergence of various hierarchical structures in groups where self-interested or utility-maximizing individuals decide to establish or severe relationships of authority or collaboration among themselves. We consider two settings: we first consider individuals who do not seek the other party's consent when establishing a relationship and then individuals who do. For both settings, we formally relate the emerged hierarchical structures with the novel inclusion of well-motivated hierarchy promoting terms in the individuals' utility functions. We first analyze the game via a static analysis and characterize all the hierarchical structures that can be formed as its solutions. We then consider the game played dynamically under stochastic interactions among individuals implementing best-response dynamics and analyze the nature of the converged networks.

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2020-01-14

Observability is a fundamental concept in system inference and estimation. This paper is focused on structural observability analysis of Cartesian product networks. Cartesian product networks emerge in variety of applications including in parallel and distributed systems. We provide a structural approach to extend the structural observability of the constituent networks (referred as the factor networks) to that of the Cartesian product network. The structural approach is based on graph theory and is generic. We introduce certain structures which are tightly related to structural observability of networks, namely parent Strongly-Connected-Component (parent SCC), parent node, and contractions. The results show that for particular type of networks (e.g. the networks containing contractions) the structural observability of the factor network can be recovered via Cartesian product. In other words, if one of the factor networks is structurally rank-deficient, using the other factor network containing a spanning cycle family, then the Cartesian product of the two nwtworks is structurally full-rank. We define certain network structures for structural observability recovery. On the other hand, we derive the number of observer nodes--the node whose state is measured by an output-- in the Cartesian product network based on the number of observer nodes in the factor networks. An example illustrates the graph-theoretic analysis in the paper.

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2020-01-14
Isuru Udayangani Hewapathirana

A network provides powerful means of representing complex relationships between entities by abstracting entities as vertices, and relationships as edges connecting vertices in a graph. Beyond the presence or absence of relationships, a network may contain additional information that can be attributed to the entities and their relationships. Attaching these additional attribute data to the corresponding vertices and edges yields an attributed graph. Moreover, in the majority of real-world applications, such as online social networks, financial networks and transactional networks, relationships between entities evolve over time. Change detection in dynamic attributed networks is an important problem in many areas, such as fraud detection, cyber intrusion detection and health care monitoring. It is a challenging problem because it involves a time sequence of attributed graphs, each of which is usually very large and can contain many attributes attached to the vertices and edges, resulting in a complex, high dimensional mathematical object. In this survey we provide an overview of some of the existing change detection methods that utilize attribute information. We categorize these methods based on the levels of structure in the graph that are exploited to detect changes. These levels are vertices, edges, subgraphs, communities and the overall graph. We focus our attention on the strengths and weaknesses of these methods, including performance and scalability. Finally we discuss some publicly available dynamic network datasets and give a brief overview of simulation models to generate synthetic dynamic attributed networks.

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2020-01-14
Manuel S. Mariani; Yanina Gimenez; Jorge Brea; Martin Minnoni; René Algesheimer; Claudio J. Tessone

Can we predict the future success of a product, service, or business by monitoring the behavior of a small set of individuals? A positive answer would have important implications for the science of success and managerial practices, yet recent works have supported diametrically opposite answers. To resolve this tension, we address this question in a unique, large-scale dataset that combines individuals' purchasing history with their social and mobility traits across an entire nation. Surprisingly, we find that the purchasing history alone enables the detection of small sets of "discoverers" whose early purchases consistently predict success. In contrast with the assumptions by most existing studies on word-of-mouth processes, the social hubs selected by network centrality are not consistently predictive of success. Our approach to detect key individuals has promise for applications in other research areas including science of science, technological forecasting, and behavioral finance.

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2020-01-14
Jonathan Brophy; Daniel Lowd

Social networking websites face a constant barrage of spam, unwanted messages that distract, annoy, and even defraud honest users. These messages tend to be very short, making them difficult to identify in isolation. Furthermore, spammers disguise their messages to look legitimate, tricking users into clicking on links and tricking spam filters into tolerating their malicious behavior. Thus, some spam filters examine relational structure in the domain, such as connections among users and messages, to better identify deceptive content. However, even when it is used, relational structure is often exploited in an incomplete or ad hoc manner. In this paper, we present Extended Group-based Graphical models for Spam (EGGS), a general-purpose method for classifying spam in online social networks. Rather than labeling each message independently, we group related messages together when they have the same author, the same content, or other domain-specific connections. To reason about related messages, we combine two popular methods: stacked graphical learning (SGL) and probabilistic graphical models (PGM). Both methods capture the idea that messages are more likely to be spammy when related messages are also spammy, but they do so in different ways; SGL uses sequential classifier predictions and PGMs use probabilistic inference. We apply our method to four different social network domains. EGGS is more accurate than an independent model in most experimental settings, especially when the correct label is uncertain. For the PGM implementation, we compare Markov logic networks to probabilistic soft logic and find that both work well with neither one dominating, and the combination of SGL and PGMs usually performs better than either on its own.

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2018-12-04
Felipe S. Abrahão; Klaus Wehmuth; Hector Zenil; Artur Ziviani

The study of complex networks has shown several applications to real-world networks. In this way, the demand for new graph abstractions in order to deal with multidimensional structures and their complexities increases. This article presents a theoretical investigation of incompressible multidimensional networks defined by generalized graph representations. In particular, we mathematically study the incompressibility (i.e., algorithmic randomness) of snapshot-dynamic networks and multiplex networks in comparison to the incompressibility of more general forms of multidimensional networks, from which snapshot-dynamic networks or multiplex networks are particular cases. In addition, from a worst-case compressibility analysis, we study some of the network topological properties of general multidimensional networks. To these ends, first we show that incompressible snapshot-dynamic (or multiplex) networks carry an amount of algorithmic information that is linearly dominated by the size of the set of time instants (or layers). This contrasts with the algorithmic information carried by incompressible general dynamic (or multilayer) networks that is of the quadratic order of the size of the set of time instants (or layers). Furthermore, we show that such incompressible general multidimensional networks have very short diameter, high k-connectivity, and degrees of the order of half of the network size within a strong-asymptotically dominated standard deviation. Then, we show that incompressible general multidimensional networks have transtemporal (crosslayer or, in general, non-sequential interdimensional) edges, i.e., edges linking vertices at non-sequential time instants (layers or, in general, elements of a node dimension).

更新日期：2020-01-15
• arXiv.cs.SI Pub Date : 2020-01-11
Jie Huang; Fanghua Ye; Xu Chen

A game process is a system where the decisions of one agent can influence the decisions of other agents. In the real world, social influences and relationships between agents may influence the decision makings of agents with game behaviors. And in turn, this also gives us the possibility to mine some information from such agents, such as the relationships between them, by the interactions in a game process. In this paper, we propose a Game Generative Network (GGN) framework which utilizes the deviation between the real game outcome and the ideal game model to build networks for game processes, which opens a door for understanding more about agents with game behaviors by graph mining approaches. We apply GGN to the team game as a concrete application and conduct experiments on relationship inference tasks.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2020-01-13
Yekai Xu; Zuofang Wan; Qingqian He; Shiguang Ni

This project describes an approach to analyze public sentiments with social media data and provides an example of the Twitter discourse during the 2019 Chinese National Day. The objective is to study the online discourse towards China with NLP algorithms, as well as observe the temporal, spatial and lingual characteristics of the expressed sentiments. Firstly, the Twitter data sets were collected between Sept 30 and Oct 3 through API and part of them were manually labeled to train the SVM. Then, a hybrid method of SVM and dictionary was applied to evaluate the sentiments of the collected tweets. After that, the tweets sentiments' time fluctuation, spatial distribution and frequently used words were given. Finally, we conclude by highlighting the possible consequences of the overall negative image of China in the English-speaking discourses and indicating future directions.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2020-01-13
Béatrice MazoyerMICS; Nicolas HervéINA; Céline HudelotMICS; Julia CageECON

In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2020-01-13
Hedongliang Liu; Hengjia Wei; Sven Puchinger; Antonia Wachter-Zeh; Moshe Schwartz

We study scalar-linear and vector-linear solutions to the generalized combination network. We derive new upper and lower bounds on the maximum number of nodes in the middle layer, depending on the network parameters. These bounds improve and extend the parameter range of known bounds. Using these new bounds we present a general lower bound on the gap in the alphabet size between scalar-linear and vector-linear solutions.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2018-05-08
Mengbin Ye; Minh Hoang Trinh; Young-Hun Lim; Brian D. O. Anderson; Hyo-Sung Ahn

In this paper, and inspired by the recent discrete-time model in [1,2], we study two continuous-time opinion dynamics models (Model 1 and Model 2) where the individuals discuss opinions on multiple logically interdependent topics. The logical interdependence between the different topics is captured by a `logic' matrix, which is distinct from the Laplacian matrix capturing interactions between individuals. For each of Model 1 and Model 2, we obtain a necessary and sufficient condition for the network to reach to a consensus on each separate topic. The condition on Model 1 involves a combination of the eigenvalues of the logic matrix and Laplacian matrix, whereas the condition on Model 2 requires only separate conditions on the logic matrix and Laplacian matrix. Further investigations of Model 1 yields two sufficient conditions for consensus, and allow us to conclude that one way to guarantee a consensus is to reduce the rate of interaction between individuals exchanging opinions. By placing further restrictions on the logic matrix, we also establish a set of Laplacian matrices which guarantee consensus for Model 1. The two models are also expanded to include stubborn individuals, who remain attached to their initial opinions. Sufficient conditions are obtained for guaranteeing convergence of the opinion dynamics system, with the final opinions generally being at a persistent disagreement. Simulations are provided to illustrate the results.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2018-06-11
Bin Liu

In this paper, we are concerned with trust modeling for agents in networked computing systems. As trust is a subjective notion that is invisible, implicit and uncertain in nature, many attempts have been made to model trust with aid of Bayesian probability theory, while the field lacks a global comprehensive analysis for variants of Bayesian trust models. We present a study to fill in this gap by giving a comprehensive review of the literature. A generic Bayesian trust (GBT) modeling perspective is highlighted here. It is shown that all models under survey can cast into a GBT based computing paradigm as special cases. We discuss both capabilities and limitations of the GBT perspective and point out open questions to answer, with a hope to advance GBT to become a pragmatic infrastructure for analyzing intrinsic relationships among variants of trust models and developing novel tools for trust evaluation.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2018-07-01
Michael Simpson; Venkatesh Srinivasan; Alex Thomo

In this work, we consider misinformation propagating through a social network and study the problem of its prevention. In this problem, a "bad" campaign starts propagating from a set of seed nodes in the network and we use the notion of a limiting (or "good") campaign to counteract the effect of misinformation. The goal is to identify a set of $k$ users that need to be convinced to adopt the limiting campaign so as to minimize the number of people that adopt the "bad" campaign at the end of both propagation processes. This work presents \emph{RPS} (Reverse Prevention Sampling), an algorithm that provides a scalable solution to the misinformation mitigation problem. Our theoretical analysis shows that \emph{RPS} runs in $O((k + l)(n + m)(\frac{1}{1 - \gamma}) \log n / \epsilon^2 )$ expected time and returns a $(1 - 1/e - \epsilon)$-approximate solution with at least $1 - n^{-l}$ probability (where $\gamma$ is a typically small network parameter and $l$ is a confidence parameter). The time complexity of \emph{RPS} substantially improves upon the previously best-known algorithms that run in time $\Omega(m n k \cdot POLY(\epsilon^{-1}))$. We experimentally evaluate \emph{RPS} on large datasets and show that it outperforms the state-of-the-art solution by several orders of magnitude in terms of running time. This demonstrates that misinformation mitigation can be made practical while still offering strong theoretical guarantees.

更新日期：2020-01-14
• arXiv.cs.SI Pub Date : 2019-06-11
Shikang Liu; Fatemeh Vahedian; David Hachen; Omar Lizardo; Christian Poellabauer; Aaron Striegel; Tijana Milenkovic

Depression and anxiety are critical public health issues affecting millions of people around the world. To identify individuals who are vulnerable to depression and anxiety, predictive models have been built that typically utilize data from one source. Unlike these traditional models, in this study, we leverage a rich heterogeneous data set from the University of Notre Dame's NetHealth study that collected individuals' (student participants') social interaction data via smartphones, health-related behavioral data via wearables (Fitbit), and trait data from surveys. To integrate the different types of information, we model the NetHealth data as a heterogeneous information network (HIN). Then, we redefine the problem of predicting individuals' mental health conditions (depression or anxiety) in a novel manner, as applying to our HIN a popular paradigm of a recommender system (RS), which is typically used to predict the preference that a person would give to an item (e.g., a movie or book). In our case, the items are the individuals' different mental health states. We evaluate four state-of-the-art RS approaches. Also, we model the prediction of individuals' mental health as another problem type - that of node classification (NC) in our HIN, evaluating in the process four node features under logistic regression as a proof-of-concept classifier. We find that our RS and NC network methods produce more accurate predictions than a logistic regression model using the same NetHealth data in the traditional non-network fashion as well as a random-approach. Also, we find that the best of the considered RS approaches outperforms all considered NC approaches. This is the first study to integrate smartphone, wearable sensor, and survey data in an HIN manner and use RS or NC on the HIN to predict individuals' mental health conditions.

更新日期：2020-01-14
Contents have been reproduced by permission of the publishers.

down
wechat
bug