Recency and quality-based ranking question in CQAs: A Stack Overflow case study,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Recency and quality-based ranking question in CQAs: A Stack Overflow case study
Information Processing & Management ( IF 7.4 ) Pub Date : 2021-03-23 , DOI: 10.1016/j.ipm.2021.102552
Leandro Amancio , Carina F. Dorneles , Daniel H. Dalip

Recency ranking, in Community-based Question Answering (CQA), would refer to put recent answers in a list’s top positions. To be recent is not related to how new is the date of creation or editing of a given answer, but how current is the content of the answer. A good ranking should also consider the answers’ quality since a current but no quality answer may be useless. Similarly, a high-quality answer, presenting adequate text and references with obsolete information, may be valueless. Combining these two issues (recency and quality) is crucial as users usually hope for current solutions and need to have fast/easy access (top items in the ranking) to the best answers to solve their problems quickly. The CQAs usually provide voting mechanisms so that the users can indicate the best quality answers. However, this method is not concerned with the recency of the answers besides being a slow and subjective process, which does not keep up with new content’s dynamism. Therefore, we propose an automatic approach that, besides the quality, also considers the answer’s recency to generating the ranking. We have used textual and non-textual features that indicate the answers’ quality and recency, extracted from the users’ answers in the CQA environment as a whole. In our approach, quality is used to classify the answers between good and poor, using a threshold value, generating two sets of answers: high quality and low quality. Then, both sets are sorted into recency order. Finally, these sets are concatenated, giving rise to the final ranking, so that the best and most current answers are in the top positions. To verify our proposal’s effectiveness, we have performed a case study in Stack Overflow CQA with a set of experiments, using different combinations of characteristics and different learning to rank Stack Overflow. Then, our main contributions are: (1) an approach to ranking answers of a questions dataset on the recency and quality of an answer; (2) a thorough evaluation of 9 learning to rank algorithms, showing that Coordinate Ascent and LambdaMart have the best performance in this task; (3) a feature analysis, which has shown that features related to the age of the response contributed to improving the ranking performance taking recency and quality into account. Furthermore, as far as we know, it is the first work that considers the recency of an answer in this task.

中文翻译：

CQA中基于新近度和基于质量的排名问题：堆栈溢出案例研究

在基于社区的问题解答（CQA）中，新近度排名是指将最近的答案放在列表的顶部。是最新的与给定答案的创建或编辑日期有多新无关，而是答案的内容有多新。好的排名还应该考虑答案的质量，因为当前的但没有质量的答案可能没有用。同样，高质量的答案可能会毫无价值，用适当的文字和参考文献提供过时的信息。将这两个问题（新近度和质量）结合起来至关重要，因为用户通常希望使用当前的解决方案，并且需要快速/轻松地访问（排名中的前几项）以找到最佳答案以快速解决他们的问题。CQA通常提供投票机制，以便用户可以指示最佳质量的答案。然而，这种方法除了是一个缓慢而主观的过程外，它与答案的近期性无关，它跟不上新内容的活力。因此，我们提出了一种自动方法，该方法除了质量外，还考虑了答案在生成排名中的新近度。我们使用了文本和非文本功能来指示答案的质量和新近度，这些功能是从整个CQA环境中的用户答案中提取的。在我们的方法中，使用阈值将质量用于区分好与坏的答案，从而生成两组答案：高质量和低质量。然后，将这两个集合按新近度顺序排序。最后，将这些集合进行级联，从而获得最终排名，从而使最佳和最新的答案排在首位。为了验证我们提案的有效性，我们通过一系列实验在Stack Overflow CQA中进行了案例研究，使用不同的特征组合和不同的学习方法对Stack Overflow进行排名。然后，我们的主要贡献是：（1）对问题数据集的答案根据其答案的近期性和质量进行排名的方法；（2）对9种学习排名算法的全面评估，表明Coordinate Ascent和LambdaMart在此任务中表现最佳；（3）特征分析表明，考虑到新近度和质量，与响应时间相关的特征有助于提高排名表现。此外，据我们所知，这是第一个考虑该任务答案的近期性的工作。使用特征的不同组合和不同的学习方法对堆栈溢出进行排名。然后，我们的主要贡献是：（1）对问题数据集的答案根据其答案的近期性和质量进行排名的方法；（2）对9种学习排名算法的全面评估，表明Coordinate Ascent和LambdaMart在此任务中表现最佳；（3）特征分析表明，考虑到新近度和质量，与响应时间相关的特征有助于提高排名表现。此外，据我们所知，这是第一个考虑该任务答案的近期性的工作。使用特征的不同组合和不同的学习方法对堆栈溢出进行排名。然后，我们的主要贡献是：（1）一种对问题数据集的答案根据其答案的近期性和质量进行排名的方法；（2）对9种学习排名算法的全面评估，表明Coordinate Ascent和LambdaMart在此任务中表现最佳；（3）特征分析表明，考虑到新近度和质量，与响应时间相关的特征有助于提高排名表现。此外，据我们所知，这是第一个考虑该任务答案的近期性的工作。（2）对9种学习排名算法的全面评估，表明Coordinate Ascent和LambdaMart在此任务中表现最佳；（3）特征分析表明，考虑到新近度和质量，与响应时间相关的特征有助于提高排名表现。此外，据我们所知，这是第一个考虑该任务答案的近期性的工作。（2）对9种学习排名算法的全面评估，表明Coordinate Ascent和LambdaMart在此任务中表现最佳；（3）特征分析表明，考虑到新近度和质量，与响应时间相关的特征有助于提高排名表现。此外，据我们所知，这是第一个考虑该任务答案的近期性的工作。

更新日期：2021-03-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11