Skip to main content
Log in

Incorporating feature representation into BiLSTM for deceptive review detection

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Consumers are increasingly influenced by product reviews when purchasing goods or services. At the same time, deceptive reviews usually mislead users. It is inefficient and inaccurate to manually identify deceptive reviews in massive reviews. Therefore, automatically identifying deceptive reviews has become a research trend. Most of existing methods are less effective since they are lack of deeply understanding of reviews. We propose a neural network method with bidirectional long short-term memory (BiLSTM) and feature combination to learn the representation of deceptive reviews. We conduct a large amount of experiments and demonstrate the effectiveness of our proposed method. Specifically, in the mixed-domain detection experiment, the results prove that our model is effective by making comparisons with other neural network-based methods. BiLSTM gives more than 3% improvement in F1 score compared with the most advanced neural network method. Since feature selection plays an important role in this direction, we combine features to improve the performance. Then we get 87.6% F1 value which outperforms the state-of-the-art method. Moreover, in the cross-domain detection experiment, our method achieves 82.4% F1 value which is about 6% higher than the state-of-the-art method on restaurant domain, and it is also robust on doctor domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Streitfeld D (2012) For \$2 a star, an online retailer gets 5-star product reviews. New York Times (26)

  2. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. arXiv:1107.4557 [cs] pp 309–319

  3. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the international conference on web search and web data mining-WSDM ’08, pp 219–230

  4. Nasraoui O (2008) Web data mining: exploring hyperlinks, contents, and usage data. ACM SIGKDD Explor Newsl 10(2):23

    Article  Google Scholar 

  5. Aghakhani H, Machiry A, Nilizadeh S, Kruegel C, Vigna G (2018) Detecting deceptive reviews using generative adversarial networks. In: 2018 IEEE security and privacy workshops (SPW), pp 89–95

  6. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751

  7. Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41

    Article  Google Scholar 

  8. Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. AAAI 333:2267–2273

    Google Scholar 

  9. Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211

    Article  Google Scholar 

  10. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  11. Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp 1566–1576

  12. Qazvinian V, Rosengren E, Radev D R, Qiaozhu M (2011) Rumor has it: identifying misinformation in microblogs. In: Proceedings of the conference on empiricalmethods in natural language processing, pp 1589–1599

  13. Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management-CIKM ’05, pp 373–380

  14. Krishnan V, Raj R (2006) Web spam detection with anti-trust rank. In: International workshop on AIRweb, pp 37–40

  15. Cormack GV (2008) Email spam filtering: a systematic review. Found Trends Inf Retr 1(4):335–455

    Article  Google Scholar 

  16. Yoo KH, Gretzel U (2009) Comparison of deceptive and truthful travel reviews. In: Höpken W, Gretzel U, Law R (eds) Information and communication technologies in tourism 2009. Springer, Berlin, pp 37–47

    Chapter  Google Scholar 

  17. Ren Y, Ji D (2017) Neural networks for deceptive opinion spam detection: an empirical study. Inf Sci 385–386:213–224

    Article  Google Scholar 

  18. Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I (2016) Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214:242–268

    Article  Google Scholar 

  19. Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, pp 873–882

  20. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. Proc Empir Methods Nat Lang Process 12:1532–1543

    Google Scholar 

  21. Wang P, Xu B, Xu J, Tian G, Liu CL, Hao H (2016) Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174:806–814

    Article  Google Scholar 

  22. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers), pp 2227–2237

  23. Sivakumar S, Rajalakshmi R (2019) Comparative evaluation of various feature weighting methods on movie reviews. In: Behera HS, Nayak J, Naik B, Abraham A (eds) Computational intelligence in data mining. Springer, Singapore, pp 721–730

    Chapter  Google Scholar 

  24. Patro BN, Kurmi VK, Kumar S, Namboodiri VP (2018) Learning semantic sentence embeddings using sequential pair-wise discriminator. arXiv preprint arXiv:1806.00807

  25. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1480–1489

  26. Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Personal Soc Psychol Bull 29(5):665–675

    Article  Google Scholar 

Download references

Acknowledgements

The work described in this paper is supported by National Natural Science Foundation of China (61806049), National Natural Science Foundation of China (31770768), the Natural Science Foundation of Heilongjiang Province of China (F2017001), Heilongjiang Province Applied Technology Research and Development Program Major Proje-ct (GA18B301), China State Forestry Administration Forestry Industry Public Welfare Project (201504307) and China Postdoctoral Science Foundation (2017M611407).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weipeng Jing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Jing, W. & Li, Y. Incorporating feature representation into BiLSTM for deceptive review detection. Computing 102, 701–715 (2020). https://doi.org/10.1007/s00607-019-00763-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00763-y

Keywords

Mathematics Subject Classification

Navigation