Multilayer Convolutional Neural Network to Filter Low Quality Content from Quora,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multilayer Convolutional Neural Network to Filter Low Quality Content from Quora
Neural Processing Letters ( IF 2.6 ) Pub Date : 2020-06-21 , DOI: 10.1007/s11063-020-10284-x
Pradeep Kumar Roy

Question answering (QA) websites now play a crucial role in meeting Internet users’ information needs. Quora is a growing QA platform where users get quick answers to their questions from their peers. Nonetheless, it is noted that a significant number of questions remained unanswered for a long time. Questions that have long been unable to receive any answer, opinion-based, need a debate to get the answers, or a valid answer does not exist, fall under Insincere question group. It is therefore important to weed out Insincere questions in order to maintain the integrity of the site. Quora have a huge number of such questions that can not be filtered manually. To overcome this problem, this paper proposes a multi-layer convolutional neural network model that helps to minimize Insincere questions from the website. Two embeddings were created from Quora dataset: (i) using Skipgram, and (ii) using Continuous Bag of Word model. The created embeddings and a pre-trained GloVe embedding vector were used for system development. The proposed model needs only the question text to predict the question is Insincere question or not and hence free from manual feature engineering. The experimental results indicated that the proposed multilayer CNN model outperforming over the earlier works by achieving the F1-score of 0.98 for the best case.

中文翻译：

多层卷积神经网络可过滤Quora中的低质量内容

现在，问答网站（QA）在满足Internet用户的信息需求方面起着至关重要的作用。Quora是一个不断发展的质量检查平台，用户可以从他们的同龄人那里快速获得问题的答案。尽管如此，应该指出的是，很长一段时间以来，仍有大量问题没有得到解答。长期无法获得任何答案的问题（基于意见），需要辩论才能获得答案，或者有效答案不存在，属于Insincere问题组。因此，重要的是要清除Insincere问题，以保持站点的完整性。Quora有大量此类问题，无法手动过滤。为了克服这个问题，本文提出了一种多层卷积神经网络模型，该模型有助于最小化来自网站的虚假问题。从Quora数据集创建了两个嵌入：（i）使用Skipgram和（ii）使用Continuous Bag of Word模型。创建的嵌入和预先训练的GloVe嵌入向量用于系统开发。所提出的模型仅需要问题文本来预测问题是否是真诚的问题，因此无需人工特征工程。实验结果表明，在最佳情况下，所提出的多层CNN模型的F1得分达到0.98，优于早期的工作。

更新日期：2020-06-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11