An unsupervised approach to detect review spam using duplicates of images, videos and Chinese texts,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An unsupervised approach to detect review spam using duplicates of images, videos and Chinese texts
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-12-24 , DOI: 10.1016/j.csl.2020.101186
Jiandun Li , Pengpeng Zhang , Liu Yang

Intuitively, image- or video-based recommendations seem to be more reliable than those containing plain text, and these types of recommendations have recently become widely encouraged and commonly seen across opinion sharing platforms. Considering their potential for manipulation, graphs (e.g., images and videos) are more vulnerable to spam than scripts. However, most state-of-the-art solutions for opinion spam detection are exclusively devoted to natural language parsing, and less work has been done concerning photos or videos. After investigating the top two business-to-customer websites, i.e., JD.com and TMALL.com, we propose an unsupervised approach to label suspected spam based on different types of duplication across images, videos and Chinese texts. Experiments verified the effectiveness of this approach and obtained several conclusions: 1) the situation of image spam is more severe than that of video and text spam; 2) for manipulation, borrowing something from a marketing page is less attractive than stealing from other reviewers; 3) in addition to using identical texts, spammers also use fictitious rare incidents to influence customers; and 4) overlapping duplications of images, videos and texts are common.

中文翻译：

一种无监督的方法，使用图像，视频和中文文本的副本来检测评论垃圾邮件

直观上，基于图像或视频的推荐似乎比包含纯文本的推荐更可靠，并且最近，这些类型的推荐受到广泛鼓励，并在观点共享平台上很常见。考虑到其潜在的操纵能力，图形（例如图像和视频）比脚本更容易受到垃圾邮件的攻击。但是，大多数最新的意见垃圾邮件检测解决方案都专门用于自然语言解析，而在照片或视频方面的工作却很少。在调查了最重要的两个企业对客户网站，即JD.com和TMALL.com之后，我们提出了一种基于图像，视频和中文文本之间不同重复类型的无监督方法来标记可疑垃圾邮件。实验验证了该方法的有效性，并得出了以下结论：1）图片垃圾邮件的情况比视频和文字垃圾邮件更为严重；2）为了进行操纵，从营销页面中借用东西比从其他审阅者那里窃取的吸引力小；3）垃圾邮件发送者除了使用相同的文字外，还使用虚假的罕见事件来影响客户；和4）图像，视频和文本的重叠重复很常见。

更新日期：2020-12-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文