A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information
Introduction
Online consumer reviews (OCRs) play an essential role in assessing the quality of a product before consumers make informed decisions [1]. The past few years have witnessed increasing customer trust in OCRs [2]. According to a recent survey1, nearly 80% of consumers trust OCRs as much as personal recommendations from friends or family, and more than 90% of consumers read OCRs before making a purchase decision.
However, as with many cases on the internet [3], fake online reviews are becoming increasingly prominent. An important reason is that the benefits of trading fake reviews are evident and proven. The Federal Trade Commission (FTC) points out that the outlay on fake reviews offers a 20 times payoff.2 Therefore, firms or retailers have strong incentives to leverage fake online reviews to influence consumers, contributing to a booming market for fake online reviews. For example, in 2019, FTC found that Sunday Riley Skincare misled consumers by posting fake online reviews of its products for nearly two years.3 Fake online reviews affect consumer trust and thus impact their purchase decision [4,5]. Besides, early fake online reviews negatively impact subsequent reviews [6]. In essence, fake online reviews are posted by fake reviewers (opinion spammers) who often exhibit anomalous behavior. Fake reviewer is the leading cause of misinformation on e-commerce platforms. Therefore, it becomes critical and urgent to develop effective methods to detect fake reviewers to maintain the authenticity of online reviews.
It is challenging to detect these reviewers due to the complexity of the reviewer's behavior and textual information. Prior studies have derived behavior-related and text-related features and fed them into machine learning approaches, including supervised classification [7,8] and unsupervised classification [9,10] to detect fake reviewers automatically.
Despite their important contributions to fake reviewer detection, there are still several limitations. First, although the importance of leveraging behavioral features in fake reviewer detection has been demonstrated [4,10], much of the research focuses on deriving novel behavioral features, which requires expensive human labor and expertise. Second, in addition to behavioral features, text features, such as n-grams (bag of words) [11], part of speech n-grams [12], and word embedding [13], have been utilized to improve detection performance. However, these text features could negatively impact the detection performance of fake reviewers [8]. The bag of words (BoW) assumption considers a document as a bag of unordered words [14] and extracts features based on word frequency [15]. If an online review is full of informal words, abbreviations, and even obfuscated words, a feature vector for such a review is often very sparse and thus could negatively impact the detection performance. Linguistic features such as POS n-grams can be extracted from online reviews for fake reviewer detection. Such features may have difficulty detecting experienced fake reviewers. They attempt to sound convincing by using words or phrases that appear almost as frequently in genuine reviews as they do in fake reviews. They only overuse a small number of words in fake reviews, thus making them sound genuine. However, the small number of such words may not appear in every fake review, which explains why n-grams are less effective at classifying fake versus non-fake reviewers. Word embedding techniques such as Word2Vec capture limited semantic information because they leverage a static embedding vector for a word in different contexts. Such techniques may negatively impact detection performance when reviews contain words with different semantic meanings in different contexts.
To address the first challenge, the feature learning of behavioral features can be leveraged to improve detection performance. Feature learning is characterized by learning representations for specific tasks from raw data [16]. Compared with deriving novel behavioral features, feature learning requires less human labor, expertise, and can learn the underlying patterns of raw behavioral data. To address the second issue, we leverage the most advanced pre-trained language model, Longformer [17], to generate contextualized text representations from online reviews. Compared with traditional linguistic features, contextualized text representation can capture more semantic information from text inputs [18]. We then can utilize deep learning models to extract valuable features from the contextualized text representations and perform corresponding classification tasks. Therefore, we propose a novel deep learning-based framework for fake reviewer detection. The framework has two key novelties:
- (1)
We proposed a behavior-sensitive feature extractor that leverages the convolution filter to learn the underlying patterns of behavioral features.
- (2)
We design a novel context-aware attention mechanism, incorporating the most advanced pre-trained language model (Longformer) and other deep learning classifiers to extract valuable features from online reviews.
The remainder of this paper is organized as follows. We first review related work in section 2. Then in section 3, we detail the major components of our research design. Section 4 describes the experimental process, including the datasets, experiment design, and model evaluation. In section 5, we describe the practical and managerial implications of the proposed model. Section 6 discusses the main findings and future research directions.
Section snippets
Deception detection techniques
We classify the literature on deception detection techniques into two mainstreams: machine-learning and nonmachine-learning methods. The first stream of literature is built on information system design science, which leverages behavior and text features to identify suspicious reviewers. The second is based on methods outside the machine-learning context.
Fake reviewer detection design
Our research design comprises three components: behavior-sensitive feature extractor, context-aware attention mechanism, and fake reviewer detection, as shown in Fig. 1. We detail the underlying process of each component in the following subsections.
Data testbed
We use data from http://Yelp.com that has been used by previous research [4,19,49]. More specifically, we use the YelpZIP dataset shared by Rayana and Akoglu [9], which has also been adopted by many studies [19,50].
The YelpZip dataset has 260,227 users who wrote 608,598 reviews between July 2010 and November 2014, with 80,466 fake reviews and 528,132 genuine reviews. Since one reviewer can post multiple reviews, we use Fig. 4 to show the empirical distribution corresponding to the number of
Practical and managerial implications
Organized fraud behavior on review systems, especially posting fake reviews, has seriously harmed the fairness of e-commerce platforms and users' interests. A typical scam is that companies set up a social media account or discussion group and attract users who look for free merchandise to post fake reviews in exchange for a product or a cash bonus. Deceptive endorsements between a seller and reviewer are always secretive and difficult to track. According to Fakespot4,
Conclusions and future directions
Ensuring the fairness and justice of online reviews is a growing societal concern. Hence, there is an urgent need to address the misinformation problem caused by fake reviewers. While prevailing studies have made several attempts to detect fake reviewers, they do not fully exploit the deep learning approaches for the behavior-based and text-based features.
In this study, we propose a novel deep learning-based framework for detecting suspicious reviewers. We evaluate the proposed framework with
CRediT authorship contribution statement
Dong Zhang: Writing – original draft, Investigation, Software. Wenwen Li: Conceptualization. Baozhuang Niu: Conceptualization, Supervision, Validation, Writing – review & editing. Chong Wu: Conceptualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors are grateful to the editor and reviewers for their helpful comments. This work was supported by the National Natural Science Foundation of China (72201105, 72125006).
Dong Zhang is currently working as a postdoc in the School of Business Administration, South China University of Technology, Guangzhou, China. His papers have been published at journals such as Journal of the Operational Research Society and Kybernetes. His research interests include data mining methods and online reviews.
References (54)
- et al.
Fake online reviews: literature review, synthesis, and directions for future research
Decis. Support. Syst.
(2020) - et al.
Document representation and feature combination for deceptive spam review detection
Neurocomputing.
(2017) - et al.
Unsupervised feature learning for spam email filtering
Comput. Electr. Eng.
(2019) - et al.
Deep contextualized text representation and learning for fake news detection
Inf. Process. Manag.
(2021) - et al.
Opinion spam detection: using multi-iterative graph-based model
Inf. Process. Manag.
(2020) - et al.
From conflicts and confusion to doubts: examining review inconsistency for fake review detection
Decis. Support. Syst.
(2021) - et al.
ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis
Futur. Gener. Comput. Syst.
(2021) - et al.
Using a hybrid content-based and behaviour-based featuring approach in a parallel environment to detect fake reviews
Electron. Commer. Res. Appl.
(2021) - et al.
Credibility of anonymous online product reviews: a language expectancy perspective
J. Manag. Inf. Syst.
(2013) - et al.
Do professional Reviews affect online user choices through user reviews? An empirical study
J. Manag. Inf. Syst.
(2016)
Veracity assessment of online data
Decis. Support. Syst.
What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews
J. Manag. Inf. Syst.
Impact of prior reviews on the subsequent review process in reputation systems
J. Manag. Inf. Syst.
Estimating the prevalence of deception in online review communities
Detecting review manipulation on online platforms with hierarchical supervised learning
J. Manag. Inf. Syst.
Collective opinion spam detection: bridging review networks and metadata
Spotting opinion spammers using behavioral footprints
A survey on fake review detection using machine learning techniques
Fake review and reviewer detection through behavioral graph partitioning integrating deep neural network
Neural Comput. & Applic.
Graph vs. bag representation models for the topic classification of web documents
World Wide Web.
Representation learning: a review and new perspectives
IEEE Trans. Pattern Anal. Mach. Intell.
Longformer: The Long-Document Transformer
Detecting anomalous online reviewers: an unsupervised approach using mixture models
J. Manag. Inf. Syst.
Bimodal distribution and co-bursting in review spam detection
Exploiting burstiness in reviews for review spammer detection
Spotting fake reviewer groups in consumer reviews
Leveraging Transfer learning techniques- BERT, RoBERTa, ALBERT and DistilBERT for Fake Review Detection. Forum for Information Retrieval Evaluation
Cited by (0)
Dong Zhang is currently working as a postdoc in the School of Business Administration, South China University of Technology, Guangzhou, China. His papers have been published at journals such as Journal of the Operational Research Society and Kybernetes. His research interests include data mining methods and online reviews.
Wenwen Li is currently working as an Assistant Professor in School of Management, Fudan University, Shanghai, China. Her papers once appeared at journals such as MIS Quarterly. Her research interests include online auction and bayesian updating models.
Baozhuang Niu is currently a Full Professor with the South China University of Technology, Guangzhou, China. He has authored/coauthored 11 top journal papers including Manufacturing & Service Operations Management (two papers), Production and Operations Management (six papers), and TRB (three papers), among other peer-review journal papers. His research interests include supply chain operations.
Chong Wu is current a Full Professor with School of Economics and Management, Harbin Institute of Technology, Harbin, China. His papers have appeared in journals such as the IEEE Transactions on Knowledge and Data Engineering, Journal of the Association for Information Science and Technology, Information Sciences, and Knowledge-based Systems.