Information Systems ( IF 3.7 ) Pub Date : 2021-07-29 , DOI: 10.1016/j.is.2021.101865 Yuxin Liu 1 , Li Wang 2 , Tengfei Shi 2 , Jinyan Li 3
Spam reviews misguide decision makings of consumers and may seriously affect fair trading in the online markets. Existing methods for detecting spam reviews mainly focus on feature designs from linguistic and psychological clues, but they hardly reveal the potential semantics. Recent research works apply deep learning to capture semantics features, while these models fail to extract multi-granularity information of the text structures nor consider the mutual influence among the sentences. We propose a hierarchical attention network in which distinct attentions are purposely used at the two layers to capture important, comprehensive, and multi-granularity semantic information. At the first layer, we especially use an N-gram CNN to extract the multi-granularity semantics of the sentences. We then use a combination of convolution structure and Bi-LSTM to extract important and comprehensive semantics in a document at the second layer. Extensive experiments on public datasets demonstrate that our model has superior detection performance over the state-of-the-art baselines, improving score in the mixed-domain to 89.3% (with 4.8 points absolute improvement), score in the Doctor domain to 92.8% (with 9.9 points absolute improvement), score in the Hotel domain to 86.1% (with 2.4 points absolute improvement) and score in the cross-domain to 84.7% (with 10.4 points absolute improvement).
中文翻译:
通过带有 N-gram CNN 和 Bi-LSTM 的分层注意架构检测垃圾评论
垃圾邮件审查误导消费者的决策,并可能严重影响在线市场的公平交易。现有的垃圾评论检测方法主要集中在语言和心理线索的特征设计,但几乎没有揭示潜在的语义。最近的研究工作应用深度学习来捕捉语义特征,而这些模型未能提取文本结构的多粒度信息,也没有考虑句子之间的相互影响。我们提出了一个分层注意力网络,其中在两层特意使用不同的注意力来捕获重要、全面和多粒度的语义信息。在第一层,我们特别使用了一个 N-gram CNN 来提取句子的多粒度语义。然后我们使用卷积结构和 Bi-LSTM 的组合在第二层提取文档中重要且全面的语义。对公共数据集的大量实验表明,我们的模型在最先进的基线上具有卓越的检测性能,提高了 混合域得分达到 89.3%(绝对提升 4.8 分), 在医生领域的得分达到 92.8%(绝对提升 9.9 分), 酒店领域的得分达到 86.1%(绝对提升 2.4 分)和 跨域得分达到 84.7%(绝对提升 10.4 分)。