An unsupervised approach to detect review spam using duplicates of images, videos and Chinese texts
Introduction
People tend to believe things that are supported by photos. Due to their intuitiveness, graphs (including images and videos) have long been encouraged by e-business and opinion-sharing websites. Recently, they have become commonly seen on such sites. For instance, over 8% of the product reviews on TMALL.com (one of the top two largest business-to-consumer (B2C) retailers in China which is owned by Alibaba; referred as TMALL in this paper) have images or videos. On average, each review on JD.com (one of the top two largest B2C retailers in China; referred as JD in the following) has 1.05 photos and 0.5 videos. Moreover, graphic experiences are often promoted by headlines or recommended with a higher level of priority. Similar to the ways in which text is adopted, people share pictures to guide others, to express their intelligence, to connect with people or just to earn platform credits/coupons (Dellarocas and Narayan, 2006; Hennig-Thurau et al., 2004; Hu et al., 2011; Zhu and Zhang, 2010). Likewise, images could also be used to lie with the objective of misleading potential customers; this is a new type of opinion spam that is not electronic word-of-mouth. Even worse, since graphs always seem to be more reliable than plain texts, the expected probability that people will be fooled by this type of spam is greater than that of text manipulation. Over the last decade, opinion spam has drawn a considerable amount of attention, and there have been substantial achievements on the topic (Deborah and Baron, 1988; Dellarocas and Narayan, 2006; Heydari et al., 2015; Jindal and Liu, 2007, 2008; Jindal et al., 2010; Li et al., 2019; Lim, 2010, 2010; Liu and Pang, 2018; Mayzlin, 2006; Mukherjee et al., 2013; Ott et al., 2012, 2011; Paul Rayson, 2001; Savage et al., 2015; Somayeh, 2013; Xie et al., 2012; Zhang et al., 2019; Zhang et al., 2018); nevertheless, most of these work were based on natural language processing and thus cannot fit graphic experiences. Investigation along reviews hosted by JD and TMALL shows that either graph- or text-oriented duplication is common and reviews duplicate in multiple ways. For instance, spammers tend to borrow photos from introduction pages, copy-and-paste videos from other posts and/or refer to a specific scenario in their texts. To further unveil this case, in this paper, we propose an approach that can address duplication of texts, images and videos simultaneously; recognizing different kinds of duplication and labeling spam especially compound spam are the top two challenges.
To our knowledge, this is the first time that images or videos have been fully addressed in the context of review spam detection. Although fields like image forensics or video faking have covered the graph tampering problem for years with plentiful techniques, the manipulation in review systems is different and we cannot directly adopt these solutions to conquer it. 1) for profit efficiency, spammers opt to steal and post someone's images/videos without pixel manipulating or frame editing; and 2) spammers prefer to borrow marketing pictures from the item's webpage, which are carefully designed by sellers and always have backgrounds in pure white (0xFFFFFF) or black (0x000000). Specifically, the contribution of this paper is threefold: 1) We focus on both texts and graphs to uncover any review spam, 2) we introduce reasonable criteria by which to detect different types of duplication and 3) we find some interesting phenomena.
The remainder of this paper is organized as follows. First, we survey state-of-the-art studies in Section 2, and then we introduce our proposal in Section 3. After that, we conduct and discuss some experiments in Section 4. Finally, we conclude this paper in Section 5.
Section snippets
Related work
The problem of review spam has attracted considerable attention in the past decade. A great number of studies have been conducted to detect spammed reviews, spammers or spammer groups. Here, we cluster and survey related work from two topics, text and graph.
Proposal
Based on previous studies (Hennig-Thurau et al., 2004; Jindal and Liu, 2007, 2008), we adopt duplication as the criterion by which to recognize opinion spam. Specifically, six kinds of duplication are considered across image-, video- and text-based reviews (see Table 2). Based on previous investigations, we propose a lightweight approach.
Dataset
For the dataset, we choose to crawl data from JD and TMALL, following their corresponding data policies. In China, JD and TMALL are the top two B2C websites. According to a recent report published by iiMedia Research, 83.8% of the retailing market was shared by them during the first half of 2018 (iiMedia, 2018). Additionally, we choose these companies because of their latent advanced regulations regarding review spam. To fully examine reviews hosted by them, we conduct a few investigations.
Of
Conclusions
To address the problem posed by the lack of serious consideration of graphic-based reviews in the field of review spam detection, we conduct a comprehensive study that covers six types of duplication pertaining to images, videos and texts. Through the datasets crawled from JD and TMALL, we verified the feasibility of our approach and arrived at some interesting conclusions: 1) graphic spam is as severe as text spam; 2) the replication of photos from other posts is more prevalent among
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China, under grant 61802247 and 61801285. We thank anonymous reviewers for their constructive comments.
References (55)
- et al.
Aligned and non-aligned double JPEG detection using convolutional neural networks
J. Vis. Commun. Image Represent.
(2017) - et al.
Design, synthesis and biological evaluation of novel nitric oxide-donating protoberberine derivatives as antitumor agents
Eur. J. Med. Chem.
(2017) - et al.
Face image manipulation detection based on a convolutional neural network
Expert Syst. Appl.
(2019) - et al.
Electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?
J. Interact. Mark.
(2004) - et al.
Detection of review spam: a survey
Expert Syst. Appl.
(2015) - et al.
Manipulation in digital word-of-mouth: a reality check for book reviews
Decis. Support Syst.
(2011) - et al.
A unified framework for detecting author spamicity by modeling review deviation
Expert Syst. Appl.
(2018) - et al.
Detection of opinion spam based on anomalous rating deviation
Expert Syst. Appl.
(2015) - et al.
Online ballot stuffing: influence of self-boosting manipulation on rating dynamics in online rating systems
Telemat. Inform.
(2019) - Agarwal, S., El-Gaaly, T., Farid, H., & Lim, S.-.N. (2020). Detecting deep-fake videos from appearance and...
Detecting deception through linguistic analysis
Contrast enhancement-based forensics in digital images
IEEE Trans. Inf. Forensics Secur.
Illumination-based texture descriptor and fruitfly support vector neural network for image forgery detection in face images
IET Image Process.
Ambiguity and rationality
J. Behav. Decis. Mak.
Exposing digital forgeries from JPEG ghosts
IEEE Trans. Inf. Forensics Secur.
Deepfake video detection using recurrent neural networks
Fake colorized image detection
IEEE Trans. Inf. Forensics Secur.
Low-complexity features for JPEG steganalysis using undecimated DCT
IEEE Trans. Inf. Forensics Secur.
Fighting fake news: image splice detection via learned self-consistency
China Retail Industry Market Research and Business Investment Decision Report
Analyzing and detecting review spam
Opinion spam and analysis
Finding unusual review patterns using unexpected rules
Cited by (7)
Meta heuristic approaches for sentiment analysis
2024, Expert SystemsExposing collaborative spammer groups through the review-response graph
2023, Multimedia Tools and ApplicationsShooting review spam with a weakly supervised approach and a sentiment-distribution-oriented method
2023, Applied IntelligenceSpamDL: A High Performance Deep Learning Spam Detector Using Stanford Global Vectors and Bidirectional Long Short-Term Memory Neural Networks
2022, Frontiers in Artificial Intelligence and ApplicationsFake Restaurant Review Detection Using Deep Neural Networks with Hybrid Feature Fusion Method
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)