An unsupervised approach to detect review spam using duplicates of images, videos and Chinese texts

doi:10.1016/j.csl.2020.101186

Computer Speech & Language

Volume 68, July 2021, 101186

https://doi.org/10.1016/j.csl.2020.101186 Get rights and content

Abstract

Intuitively, image- or video-based recommendations seem to be more reliable than those containing plain text, and these types of recommendations have recently become widely encouraged and commonly seen across opinion sharing platforms. Considering their potential for manipulation, graphs (e.g., images and videos) are more vulnerable to spam than scripts. However, most state-of-the-art solutions for opinion spam detection are exclusively devoted to natural language parsing, and less work has been done concerning photos or videos. After investigating the top two business-to-customer websites, i.e., JD.com and TMALL.com, we propose an unsupervised approach to label suspected spam based on different types of duplication across images, videos and Chinese texts. Experiments verified the effectiveness of this approach and obtained several conclusions: 1) the situation of image spam is more severe than that of video and text spam; 2) for manipulation, borrowing something from a marketing page is less attractive than stealing from other reviewers; 3) in addition to using identical texts, spammers also use fictitious rare incidents to influence customers; and 4) overlapping duplications of images, videos and texts are common.

Introduction

People tend to believe things that are supported by photos. Due to their intuitiveness, graphs (including images and videos) have long been encouraged by e-business and opinion-sharing websites. Recently, they have become commonly seen on such sites. For instance, over 8% of the product reviews on TMALL.com (one of the top two largest business-to-consumer (B2C) retailers in China which is owned by Alibaba; referred as TMALL in this paper) have images or videos. On average, each review on JD.com (one of the top two largest B2C retailers in China; referred as JD in the following) has 1.05 photos and 0.5 videos. Moreover, graphic experiences are often promoted by headlines or recommended with a higher level of priority. Similar to the ways in which text is adopted, people share pictures to guide others, to express their intelligence, to connect with people or just to earn platform credits/coupons (Dellarocas and Narayan, 2006; Hennig-Thurau et al., 2004; Hu et al., 2011; Zhu and Zhang, 2010). Likewise, images could also be used to lie with the objective of misleading potential customers; this is a new type of opinion spam that is not electronic word-of-mouth. Even worse, since graphs always seem to be more reliable than plain texts, the expected probability that people will be fooled by this type of spam is greater than that of text manipulation. Over the last decade, opinion spam has drawn a considerable amount of attention, and there have been substantial achievements on the topic (Deborah and Baron, 1988; Dellarocas and Narayan, 2006; Heydari et al., 2015; Jindal and Liu, 2007, 2008; Jindal et al., 2010; Li et al., 2019; Lim, 2010, 2010; Liu and Pang, 2018; Mayzlin, 2006; Mukherjee et al., 2013; Ott et al., 2012, 2011; Paul Rayson, 2001; Savage et al., 2015; Somayeh, 2013; Xie et al., 2012; Zhang et al., 2019; Zhang et al., 2018); nevertheless, most of these work were based on natural language processing and thus cannot fit graphic experiences. Investigation along reviews hosted by JD and TMALL shows that either graph- or text-oriented duplication is common and reviews duplicate in multiple ways. For instance, spammers tend to borrow photos from introduction pages, copy-and-paste videos from other posts and/or refer to a specific scenario in their texts. To further unveil this case, in this paper, we propose an approach that can address duplication of texts, images and videos simultaneously; recognizing different kinds of duplication and labeling spam especially compound spam are the top two challenges.

To our knowledge, this is the first time that images or videos have been fully addressed in the context of review spam detection. Although fields like image forensics or video faking have covered the graph tampering problem for years with plentiful techniques, the manipulation in review systems is different and we cannot directly adopt these solutions to conquer it. 1) for profit efficiency, spammers opt to steal and post someone's images/videos without pixel manipulating or frame editing; and 2) spammers prefer to borrow marketing pictures from the item's webpage, which are carefully designed by sellers and always have backgrounds in pure white (0xFFFFFF) or black (0x000000). Specifically, the contribution of this paper is threefold: 1) We focus on both texts and graphs to uncover any review spam, 2) we introduce reasonable criteria by which to detect different types of duplication and 3) we find some interesting phenomena.

The remainder of this paper is organized as follows. First, we survey state-of-the-art studies in Section 2, and then we introduce our proposal in Section 3. After that, we conduct and discuss some experiments in Section 4. Finally, we conclude this paper in Section 5.

Section snippets

Related work

The problem of review spam has attracted considerable attention in the past decade. A great number of studies have been conducted to detect spammed reviews, spammers or spammer groups. Here, we cluster and survey related work from two topics, text and graph.

Proposal

Based on previous studies (Hennig-Thurau et al., 2004; Jindal and Liu, 2007, 2008), we adopt duplication as the criterion by which to recognize opinion spam. Specifically, six kinds of duplication are considered across image-, video- and text-based reviews (see Table 2). Based on previous investigations, we propose a lightweight approach.

Dataset

For the dataset, we choose to crawl data from JD and TMALL, following their corresponding data policies. In China, JD and TMALL are the top two B2C websites. According to a recent report published by iiMedia Research, 83.8% of the retailing market was shared by them during the first half of 2018 (iiMedia, 2018). Additionally, we choose these companies because of their latent advanced regulations regarding review spam. To fully examine reviews hosted by them, we conduct a few investigations.

Conclusions

To address the problem posed by the lack of serious consideration of graphic-based reviews in the field of review spam detection, we conduct a comprehensive study that covers six types of duplication pertaining to images, videos and texts. Through the datasets crawled from JD and TMALL, we verified the feasibility of our approach and arrived at some interesting conclusions: 1) graphic spam is as severe as text spam; 2) the replication of photos from other posts is more prevalent among

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China, under grant 61802247 and 61801285. We thank anonymous reviewers for their constructive comments.

References (55)

M. Barni et al.
Aligned and non-aligned double JPEG detection using convolutional neural networks
J. Vis. Commun. Image Represent.
(2017)
J. Chen et al.
Design, synthesis and biological evaluation of novel nitric oxide-donating protoberberine derivatives as antitumor agents
Eur. J. Med. Chem.
(2017)
L.M. Dang et al.
Face image manipulation detection based on a convolutional neural network
Expert Syst. Appl.
(2019)
T. Hennig-Thurau et al.
Electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?
J. Interact. Mark.
(2004)
A. Heydari et al.
Detection of review spam: a survey
Expert Syst. Appl.
(2015)
N. Hu et al.
Manipulation in digital word-of-mouth: a reality check for book reviews
Decis. Support Syst.
(2011)
Y. Liu et al.
A unified framework for detecting author spamicity by modeling review deviation
Expert Syst. Appl.
(2018)
D. Savage et al.
Detection of opinion spam based on anomalous rating deviation
Expert Syst. Appl.
(2015)
L. Zhang et al.
Online ballot stuffing: influence of self-boosting manipulation on rating dynamics in online rating systems
Telemat. Inform.
(2019)
Agarwal, S., El-Gaaly, T., Farid, H., & Lim, S.-.N. (2020). Detecting deep-fake videos from appearance and...

Anderson, E., & Simester, D. (2013). Deceptive reviews: the influential...

Bonomi, M., Pasquini, C., & Boato, G. (2020). Dynamic texture analysis for detecting fake faces in video sequences....

J.K. Burgoon et al.

Detecting deception through linguistic analysis

G. Cao et al.

Contrast enhancement-based forensics in digital images

IEEE Trans. Inf. Forensics Secur.

(2014)

R. Cristin et al.

Illumination-based texture descriptor and fruitfly support vector neural network for image forgery detection in face images

IET Image Process.

(2018)

F. Deborah et al.

Ambiguity and rationality

J. Behav. Decis. Mak.

(1988)

Dellarocas, C., & Narayan, R. (2006). What motivates consumers to review a product online? A study of the...

H. Farid

Exposing digital forgeries from JPEG ghosts

IEEE Trans. Inf. Forensics Secur.

(2009)

Fxsjy. Retrieved from...

D. Güera et al.

Deepfake video detection using recurrent neural networks

Y. Guo et al.

Fake colorized image detection

IEEE Trans. Inf. Forensics Secur.

(2018)

V. Holub et al.

Low-complexity features for JPEG steganalysis using undecimated DCT

IEEE Trans. Inf. Forensics Secur.

(2015)

M. Huh et al.

Fighting fake news: image splice detection via learned self-consistency

China Retail Industry Market Research and Business Investment Decision Report

(2018)

N. Jindal et al.

Analyzing and detecting review spam

N. Jindal et al.

Opinion spam and analysis

N. Jindal et al.

Finding unusual review patterns using unexpected rules

Cited by (7)

Meta heuristic approaches for sentiment analysis
2024, Expert Systems
Exposing collaborative spammer groups through the review-response graph
2023, Multimedia Tools and Applications
Shooting review spam with a weakly supervised approach and a sentiment-distribution-oriented method
2023, Applied Intelligence
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines
2023, IEEE Access
SpamDL: A High Performance Deep Learning Spam Detector Using Stanford Global Vectors and Bidirectional Long Short-Term Memory Neural Networks
2022, Frontiers in Artificial Intelligence and Applications
Fake Restaurant Review Detection Using Deep Neural Networks with Hybrid Feature Fusion Method
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

An unsupervised approach to detect review spam using duplicates of images, videos and Chinese texts

Abstract

Introduction

Section snippets

Related work

Proposal

Dataset

Conclusions

Declaration of Competing Interest

Acknowledgment

J. Vis. Commun. Image Represent.

Eur. J. Med. Chem.

Expert Syst. Appl.

J. Interact. Mark.

Expert Syst. Appl.

Decis. Support Syst.

Expert Syst. Appl.

Expert Syst. Appl.

Telemat. Inform.

Detecting deception through linguistic analysis

Contrast enhancement-based forensics in digital images

IEEE Trans. Inf. Forensics Secur.

Illumination-based texture descriptor and fruitfly support vector neural network for image forgery detection in face images

IET Image Process.

Ambiguity and rationality

J. Behav. Decis. Mak.

Exposing digital forgeries from JPEG ghosts

IEEE Trans. Inf. Forensics Secur.

Deepfake video detection using recurrent neural networks

Fake colorized image detection

IEEE Trans. Inf. Forensics Secur.

Low-complexity features for JPEG steganalysis using undecimated DCT

IEEE Trans. Inf. Forensics Secur.

Fighting fake news: image splice detection via learned self-consistency

China Retail Industry Market Research and Business Investment Decision Report

Analyzing and detecting review spam

Opinion spam and analysis

Finding unusual review patterns using unexpected rules