Abstract
Consumers are increasingly influenced by product reviews when purchasing goods or services. At the same time, deceptive reviews usually mislead users. It is inefficient and inaccurate to manually identify deceptive reviews in massive reviews. Therefore, automatically identifying deceptive reviews has become a research trend. Most of existing methods are less effective since they are lack of deeply understanding of reviews. We propose a neural network method with bidirectional long short-term memory (BiLSTM) and feature combination to learn the representation of deceptive reviews. We conduct a large amount of experiments and demonstrate the effectiveness of our proposed method. Specifically, in the mixed-domain detection experiment, the results prove that our model is effective by making comparisons with other neural network-based methods. BiLSTM gives more than 3% improvement in F1 score compared with the most advanced neural network method. Since feature selection plays an important role in this direction, we combine features to improve the performance. Then we get 87.6% F1 value which outperforms the state-of-the-art method. Moreover, in the cross-domain detection experiment, our method achieves 82.4% F1 value which is about 6% higher than the state-of-the-art method on restaurant domain, and it is also robust on doctor domain.
Similar content being viewed by others
References
Streitfeld D (2012) For \$2 a star, an online retailer gets 5-star product reviews. New York Times (26)
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. arXiv:1107.4557 [cs] pp 309–319
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the international conference on web search and web data mining-WSDM ’08, pp 219–230
Nasraoui O (2008) Web data mining: exploring hyperlinks, contents, and usage data. ACM SIGKDD Explor Newsl 10(2):23
Aghakhani H, Machiry A, Nilizadeh S, Kruegel C, Vigna G (2018) Detecting deceptive reviews using generative adversarial networks. In: 2018 IEEE security and privacy workshops (SPW), pp 89–95
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. AAAI 333:2267–2273
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp 1566–1576
Qazvinian V, Rosengren E, Radev D R, Qiaozhu M (2011) Rumor has it: identifying misinformation in microblogs. In: Proceedings of the conference on empiricalmethods in natural language processing, pp 1589–1599
Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management-CIKM ’05, pp 373–380
Krishnan V, Raj R (2006) Web spam detection with anti-trust rank. In: International workshop on AIRweb, pp 37–40
Cormack GV (2008) Email spam filtering: a systematic review. Found Trends Inf Retr 1(4):335–455
Yoo KH, Gretzel U (2009) Comparison of deceptive and truthful travel reviews. In: Höpken W, Gretzel U, Law R (eds) Information and communication technologies in tourism 2009. Springer, Berlin, pp 37–47
Ren Y, Ji D (2017) Neural networks for deceptive opinion spam detection: an empirical study. Inf Sci 385–386:213–224
Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I (2016) Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214:242–268
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, pp 873–882
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. Proc Empir Methods Nat Lang Process 12:1532–1543
Wang P, Xu B, Xu J, Tian G, Liu CL, Hao H (2016) Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174:806–814
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers), pp 2227–2237
Sivakumar S, Rajalakshmi R (2019) Comparative evaluation of various feature weighting methods on movie reviews. In: Behera HS, Nayak J, Naik B, Abraham A (eds) Computational intelligence in data mining. Springer, Singapore, pp 721–730
Patro BN, Kurmi VK, Kumar S, Namboodiri VP (2018) Learning semantic sentence embeddings using sequential pair-wise discriminator. arXiv preprint arXiv:1806.00807
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Personal Soc Psychol Bull 29(5):665–675
Acknowledgements
The work described in this paper is supported by National Natural Science Foundation of China (61806049), National Natural Science Foundation of China (31770768), the Natural Science Foundation of Heilongjiang Province of China (F2017001), Heilongjiang Province Applied Technology Research and Development Program Major Proje-ct (GA18B301), China State Forestry Administration Forestry Industry Public Welfare Project (201504307) and China Postdoctoral Science Foundation (2017M611407).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, W., Jing, W. & Li, Y. Incorporating feature representation into BiLSTM for deceptive review detection. Computing 102, 701–715 (2020). https://doi.org/10.1007/s00607-019-00763-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-019-00763-y
Keywords
- Deceptive review detection
- Bidirectional long short-term memory neural network
- Feature combination
- Representation learning