Detecting Spam Product Reviews in Roman Urdu Script,The Computer Journal

当前位置： X-MOL 学术 › Comput. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting Spam Product Reviews in Roman Urdu Script
The Computer Journal ( IF 1.5 ) Pub Date : 2020-12-21 , DOI: 10.1093/comjnl/bxaa164
Naveed Hussain _{1,

2} , Hamid Turab Mirza ₃ , Faiza Iqbal ₂ , Ibrar Hussain ₂ , Mohammad Kaleem ₄

Affiliation

In recent years, online customer reviews have become the main source to determine public opinion about offered products and services. Therefore, manufacturers and sellers are extremely concerned with customer reviews, as these can have a direct impact on their businesses. Unfortunately, there is an increasing trend to write spam reviews to promote or demote targeted products or services. This practice, known as review spamming, has posed many questions regarding the authenticity and dependability of customers’ review-based business processes. Although the spam review detection (SRD) problem has gained much attention from researchers, existing studies on SRD have mostly worked on datasets of English, Chinese, Arabic, Persian, and Malay languages. Therefore, the objective of this research is to identify the spam in Roman Urdu reviews using different classification models based on linguistic features and behavioral features. The performance of each classifier is evaluated in a number of perspectives: (i) linguistic features are used to calculate accuracy (F1 score) of each classifier; (ii) behavioral features combined with distributional and non-distributional aspects are used to evaluate accuracy (F1 score) of each classifier; and (iii) the combination of both linguistic and behavioral features (distributional and non-distributional aspects) are used to evaluate the accuracy of each classifier. The experimental evaluations demonstrated an improved accuracy (F1 score: 0.96), which is the result of combinations of linguistic features and behavioral features with the distributional aspect of reviewers. Moreover, behavioral features using distributional characteristic achieve an accuracy (F1 score: 0.86) and linguistic features shows accuracy (F1 score: 0.69). The outcome of this research can be used to increase customers’ confidence in the South Asian region. It can also help to reduce spam reviews in the South Asian region, particularly in Pakistan.

中文翻译：

使用Roman Urdu脚本检测垃圾邮件产品评论

近年来，在线客户评论已成为确定有关所提供产品和服务的公众意见的主要来源。因此，制造商和销售商都非常关注客户评论，因为这些评论可能会对他们的业务产生直接影响。不幸的是，写垃圾邮件评论以促进或降级目标产品或服务的趋势正在增加。这种称为审查垃圾邮件的做法引起了许多有关客户基于审查的业务流程的真实性和可靠性的问题。尽管垃圾邮件审查检测（SRD）问题已引起研究人员的广泛关注，但有关SRD的现有研究主要针对英语，中文，阿拉伯语，波斯语和马来语的数据集。因此，这项研究的目的是使用基于语言特征和行为特征的不同分类模型来识别罗马乌尔都语评论中的垃圾邮件。从多个角度评估每个分类器的性能：（i）语言功能用于计算每个分类器的准确性（F1分数）；（ii）结合分布和非分布方面的行为特征来评估每个分类器的准确性（F1得分）；（iii）语言和行为特征（分布和非分布方面）的组合用于评估每个分类器的准确性。实验评估显示出更高的准确性（F1分数：0.96），这是语言特征和行为特征与审阅者的分布方面相结合的结果。此外，使用分布特征的行为特征可以达到准确度（F1分数：0.86），语言特征可以显示准确度（F1分数：0.69）。这项研究的结果可用于提高客户对南亚地区的信心。它还可以帮助减少南亚地区，特别是巴基斯坦的垃圾邮件评论。

更新日期：2020-12-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文