当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Making social networks more human: A topological approach
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2019-07-24 , DOI: 10.1002/sam.11420
Jonathan W. Berry 1 , Cynthia A. Phillips 1 , Jared Saia 2
Affiliation  

A key problem in social network analysis is to identify nonhuman interactions. State‐of‐the‐art bot‐detection systems like Botometer train machine‐learning models on user‐specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non‐human activity from publicly available electronic‐social‐network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar‐inspired bound; and then remove the violator‐to‐violator edges. We run our algorithm on multiple social networks and show that our Dunbar‐inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter‐2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator‐violator edges from the 1.2‐billion‐edge Twitter‐2010 follower graph, 34% of the violator nodes experience a factor‐of‐two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.

中文翻译:

使社交网络更人性化:一种拓扑方法

社交网络分析中的一个关键问题是识别非人际互动。像Botometer这样的最先进的机器人检测系统会根据用户特定的数据训练机器学习模型。不幸的是,这些方法不适用于仅提供拓扑信息的数据集。在本文中,我们提出了一种新的纯拓扑方法。我们的方法从可公开获得的电子社会网络数据集中(包括例如在斯坦福网络分析项目存储库(SNAP)中的那些数据)中删除那些连接节点,以显示具有非人类活动的有力证据的节点。我们的方法论受到邓巴(Dunbar)进化心理学的经典著作的启发,该著作对单个人可以参与的一系列社会联系的总强度设定了上限。我们通过Easley和Kleinberg'对边缘强度进行建模 拓扑估计;如果这些边缘强度的总和超过邓巴启发的界限,则将节点标记为“违反者”;然后移除违反者对违反者的边缘。我们在多个社交网络上运行我们的算法,并表明我们的Dunbar启发式绑定似乎适用于社交网络,但不适用于非社交网络。我们的清理过程将Twitter-2010关注者图中0.04%的节点分类为违反者,我们发现这些违反者节点中有80%以上的Botometer得分为0.5或更高。此外,从12亿边缘的Twitter-2010追踪者图表中删除了大约1500万违反者边缘之后,34%的违反者节点的PageRank减少了两倍。PageRank是许多图算法(例如节点/边缘排名和图稀疏化)的关键组成部分。从而,
更新日期:2019-07-24
down
wechat
bug