当前位置: X-MOL 学术Journal of Data and Information Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effective Opinion Spam Detection: A Study on Review Metadata Versus Content
Journal of Data and Information Science Pub Date : 2020-05-20 , DOI: 10.2478/jdis-2020-0013
Ajay Rastogi 1 , Monica Mehrotra 1 , Syed Shafat Ali 1
Affiliation  

Abstract Purpose This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection. Design/methodology/approach Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection. Findings Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual. Research limitations The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com. Practical implications The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information. Originality/value To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.

中文翻译:

有效的意见垃圾邮件检测:评论元数据与内容的研究

摘要目的本文旨在分析意见垃圾邮件检测中两种主要类型的功能(基于元数据(行为)和基于内容(文本))的有效性。设计/方法/方法基于垃圾邮件检测的观点,我们的方法在三种设置下起作用:以审查为中心(垃圾邮件检测),以审查者为中心(垃圾邮件检测)和以产品为中心(针对垃圾邮件的产品检测)。除此之外,为了消除任何类型的分类器偏倚,我们采用了四个分类器来更好地,公正地反映所获得的结果。此外,我们提出了一组新功能,并与一些著名的相关作品进行了比较。在两个真实世界的数据集上进行的实验表明,意见垃圾邮件检测中不同功能的有效性。调查结果我们的调查结果表明,在所有三种情况下,行为功能都比文本功能更有效,也更有效。另外,在混合特征上训练的模型所产生的结果与在行为特征上训练的模型相比,在文本上产生的结果非常相似,从而进一步确立了行为特征作为意见垃圾邮件的主要指标的优越性。本作品中使用的功能提供了对其他相关作品中使用的现有功能的改进。此外,特征提取阶段的计算时间分析表明,行为特征的成本效率优于文本。研究局限性本文进行的分析仅限于两个著名的数据集,即Yelp.com的YelpZip和YelpNYC。实际的意义本文获得的结果可用于改进对垃圾邮件的检测,其中研究人员可以致力于改进和开发更着重于元数据信息的特征工程和选择技术。原创性/价值据我们所知,本研究是同类研究中的第一个,它考虑了三种观点(以审阅,审阅者和以产品为中心)和四个分类器,以使用两种主要特征来分析垃圾邮件检测的有效性。这项研究还介绍了一些新颖的功能,有助于提高意见垃圾邮件检测方法的性能。原创性/价值据我们所知,本研究是同类研究中的第一个,它考虑了三种观点(以审阅,审阅者和以产品为中心)和四个分类器,以使用两种主要特征来分析垃圾邮件检测的有效性。这项研究还介绍了一些新颖的功能,有助于提高意见垃圾邮件检测方法的性能。原创性/价值据我们所知,本研究是同类研究中的第一个,它考虑了三种观点(以审阅,审阅者和以产品为中心)和四个分类器,以使用两种主要特征来分析垃圾邮件检测的有效性。这项研究还介绍了一些新颖的功能,有助于提高意见垃圾邮件检测方法的性能。
更新日期:2020-05-20
down
wechat
bug