当前位置: X-MOL 学术The Electronic Library › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Spam detection and high-quality features to analyse question –answer pairs
The Electronic Library ( IF 1.675 ) Pub Date : 2020-11-26 , DOI: 10.1108/el-05-2020-0120
Hei Chia Wang , Yu Hung Chiang , Si Ting Lin

Purpose

In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly related to irrelevant or even spam answers. Previous studies of CQA portals have faced two important issues: answer quality analysis and spam answer filtering. Therefore, the purposes of this study are to filter spam answers in advance using two-phase identification methods and then automatically classify the different types of question and answer (QA) pairs by deep learning. Finally, this study proposes a comprehensive study of answer quality prediction for different types of QA pairs.

Design/methodology/approach

This study proposes an integrated model with a two-phase identification method that filters spam answers in advance and uses a deep learning method [recurrent convolutional neural network (R-CNN)] to automatically classify various types of questions. Logistic regression (LR) is further applied to examine which answer quality features significantly indicate high-quality answers to different types of questions.

Findings

There are four prominent findings. (1) This study confirms that conducting spam filtering before an answer quality analysis can reduce the proportion of high-quality answers that are misjudged as spam answers. (2) The experimental results show that answer quality is better when question types are included. (3) The analysis results for different classifiers show that the R-CNN achieves the best macro-F1 scores (74.8%) in the question type classification module. (4) Finally, the experimental results by LR show that author ranking, answer length and common words could significantly impact answer quality for different types of questions.

Originality/value

The proposed system is simultaneously able to detect spam answers and provide users with quick and efficient retrieval mechanisms for high-quality answers to different types of questions in CQA. Moreover, this study further validates that crucial features exist among the different types of questions that can impact answer quality. Overall, an identification system automatically summarises high-quality answers for each different type of questions from the pool of messy answers in CQA, which can be very useful in helping users make decisions.



中文翻译:

垃圾邮件检测和高质量功能可分析问题-答案对

目的

在社区问答(CQA)服务中,由于用户的主观性和知识的局限性,答案质量的分布可能会发生巨大变化-从高度相关到无关紧要甚至是垃圾邮件的答案。先前对CQA门户的研究面临两个重要问题:答案质量分析和垃圾邮件答案过滤。因此,本研究的目的是使用两阶段识别方法预先过滤垃圾邮件答案,然后通过深度学习自动对不同类型的问答(QA)对进行分类。最后,本研究提出了针对不同类型的QA对的答案质量预测的综合研究。

设计/方法/方法

这项研究提出了一种带有两阶段识别方法的集成模型,该模型可以预先过滤垃圾邮件答案,并使用深度学习方法[递归卷积神经网络(R-CNN)]自动对各种类型的问题进行分类。逻辑回归(LR)进一步用于检查哪些答案质量特征显着表明针对不同类型问题的高质量答案。

发现

有四个突出的发现。(1)这项研究证实,在对答案质量进行分析之前进行垃圾邮件过滤可以减少被误判为垃圾邮件答案的高质量答案的比例。(2)实验结果表明,当包含问题类型时,回答质量更好。(3)不同分类器的分析结果表明,在问题类型分类模块中,R-CNN的宏F1得分最高(74.8%)。(4)最后,通过LR进行的实验结果表明,作者排名,答案长度和常用词会显着影响不同类型问题的答案质量。

创意/价值

所提出的系统能够同时检测垃圾邮件的答案,并为用户提供快速有效的检索机制,以针对CQA中不同类型的问题提供高质量的答案。此外,本研究进一步验证了在不同类型的问题之间存在可能影响答案质量的关键特征。总体而言,识别系统会从CQA的混乱答案库中自动总结出每种不同类型问题的高质量答案,这对于帮助用户做出决策非常有用。

更新日期:2021-01-12
down
wechat
bug