当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable Recommendation of Wikipedia Articles to Editors Using Representation Learning
arXiv - CS - Information Retrieval Pub Date : 2020-09-24 , DOI: arxiv-2009.11771
Oleksii Moskalenko, Denis Parra, and Diego Saez-Trumper

Wikipedia is edited by volunteer editors around the world. Considering the large amount of existing content (e.g. over 5M articles in English Wikipedia), deciding what to edit next can be difficult, both for experienced users that usually have a huge backlog of articles to prioritize, as well as for newcomers who that might need guidance in selecting the next article to contribute. Therefore, helping editors to find relevant articles should improve their performance and help in the retention of new editors. In this paper, we address the problem of recommending relevant articles to editors. To do this, we develop a scalable system on top of Graph Convolutional Networks and Doc2Vec, learning how to represent Wikipedia articles and deliver personalized recommendations for editors. We test our model on editors' histories, predicting their most recent edits based on their prior edits. We outperform competitive implicit-feedback collaborative-filtering methods such as WMRF based on ALS, as well as a traditional IR-method such as content-based filtering based on BM25. All of the data used on this paper is publicly available, including graph embeddings for Wikipedia articles, and we release our code to support replication of our experiments. Moreover, we contribute with a scalable implementation of a state-of-art graph embedding algorithm as current ones cannot efficiently handle the sheer size of the Wikipedia graph.

中文翻译:

使用表征学习向编辑推荐维基百科文章

维基百科由世界各地的志愿编辑编辑。考虑到大量现有内容(例如,英文维基百科中超过 500 万篇文章),决定接下来要编辑的内容可能很困难,无论是对于通常有大量文章需要优先处理的有经验的用户,还是对于可能需要优先处理的新手指导选择下一篇贡献的文章。因此,帮助编辑找到相关文章应该可以提高他们的表现,有助于留住新编辑。在本文中,我们解决了向编辑推荐相关文章的问题。为此,我们在 Graph Convolutional Networks 和 Doc2Vec 之上开发了一个可扩展的系统,学习如何表示维基百科文章并为编辑提供个性化推荐。我们在编辑的历史上测试我们的模型,根据他们之前的编辑预测他们最近的编辑。我们优于竞争性的隐式反馈协同过滤方法,例如基于 ALS 的 WMRF,以及传统的 IR 方法,例如基于 BM25 的基于内容的过滤。本文中使用的所有数据都是公开可用的,包括维基百科文章的图嵌入,我们发布了代码以支持我们的实验复制。此外,我们为最先进的图嵌入算法的可扩展实现做出了贡献,因为当前的算法无法有效处理维基百科图的庞大规模。本文中使用的所有数据都是公开可用的,包括维基百科文章的图嵌入,我们发布了代码以支持我们的实验复制。此外,我们为最先进的图嵌入算法的可扩展实现做出了贡献,因为当前的算法无法有效处理维基百科图的庞大规模。本文中使用的所有数据都是公开可用的,包括维基百科文章的图嵌入,我们发布了代码以支持我们的实验复制。此外,我们为最先进的图嵌入算法的可扩展实现做出了贡献,因为当前的算法无法有效处理维基百科图的庞大规模。
更新日期:2020-09-25
down
wechat
bug