当前位置: X-MOL 学术User Model. User-Adap. Inter. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Domain-based Latent Personal Analysis and its use for impersonation detection in social media
User Modeling and User-Adapted Interaction ( IF 3.6 ) Pub Date : 2021-08-09 , DOI: 10.1007/s11257-021-09295-7
Osnat Mokryn 1 , Hagit Ben-Shoshan 2
Affiliation  

Zipf’s law defines an inverse proportion between a word’s ranking in a given corpus and its frequency in it, roughly dividing the vocabulary into frequent words and infrequent ones. Here, we stipulate that within a domain an author’s signature can be derived from, in loose terms, the author’s missing popular words and frequently used infrequent words. We devise a method, termed Latent Personal Analysis (LPA), for finding domain-based attributes for entities in a domain: their distance from the domain and their signature, which determines how they most differ from a domain. We identify the most suitable distance metric for the method among several and construct the distances and personal signatures for authors, the domain’s entities. The signature consists of both over-used terms (compared to the average) and missing popular terms. We validate the correctness and power of the signatures in identifying users and set existence conditions. We test LPA in several domains, both textual and non-textual. We then demonstrate the use of the method in explainable authorship attribution: we define algorithms that utilize LPA to identify two types of impersonation in social media: (1) authors with sockpuppets (multiple) accounts and (2) front-users accounts, operated by several authors. We validate the algorithms and employ them over a large-scale dataset obtained from a social media site with over 4000 users. We corroborate these results using temporal rate analysis. LPA can further be used to devise personal attributes in a wide range of scientific domains in which the constituents have a long-tail distribution of elements.



中文翻译:

基于领域的潜在个人分析及其在社交媒体中的假冒检测中的应用

Zipf 定律定义了一个词在给定语料库中的排名与其在其中的频率成反比,大致将词汇分为常用词和不常用词。在这里,我们规定,在一个域内,作者的签名可以从作者遗漏的流行词和经常使用的不常用词中派生出来。我们设计了一种称为潜在个人分析 (LPA) 的方法,用于查找域中实体的基于域的属性:它们与域的距离及其签名,这决定了它们与域的最大差异。我们在多个方法中确定最合适的距离度量,并为作者、域实体构建距离和个人签名。签名由过度使用的术语(与平均值相比)和缺少流行术语。我们验证签名在识别用户和设置存在条件方面的正确性和能力。我们在文本和非文本的多个领域中测试 LPA。然后,我们演示了该方法在可解释的作者归属中的使用:我们定义了利用 LPA 来识别社交媒体中两种类型的模仿的算法:(1)具有 sockpuppets(多个)帐户的作者和(2)前端用户帐户,由几位作者操作。我们验证了这些算法,并将它们应用于从拥有 4000 多个用户的社交媒体网站获得的大规模数据集。我们使用时间速率分析证实了这些结果。LPA 可进一步用于在成分具有元素长尾分布的广泛科学领域中设计个人属性。

更新日期:2021-08-10
down
wechat
bug