当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Virtual Proximity Citation (VCP): A Supervised Deep Learning Method to Relate Uncited Papers On Grounds of Citation Proximity
arXiv - CS - Digital Libraries Pub Date : 2020-09-25 , DOI: arxiv-2009.13294
Rohit Rawat

Citation based approaches have seen good progress for recommending research papers using citations in the paper. Citation proximity analysis which uses the in-text citation proximity to find relatedness between two research papers is better than co-citation analysis and bibliographic analysis. However, one common problem which exists in each approach is that paper should be well cited. If documents are not cited properly or not cited at all, then using these approaches will not be helpful. To overcome the problem, this paper discusses the approach Virtual Citation Proximity (VCP) which uses Siamese Neural Network along with the notion of citation proximity analysis and content-based filtering. To train this model, the actual distance between the two citations in a document is used as ground truth, this distance is the word count between the two citations. VCP is trained on Wikipedia articles for which the actual word count is available which is used to calculate the similarity between the documents. This can be used to calculate relatedness between two documents in a way they would have been cited in the proximity even if the documents are uncited. This approach has shown a great improvement in predicting proximity with basic neural networks over the approach which uses the Average Citation Proximity index value as the ground truth. This can be improved by using a complex neural network and proper hyper tuning of parameters.

中文翻译:

Virtual Proximity Citation (VCP):一种基于引用接近性关联未引用论文的监督深度学习方法

基于引文的方法在使用论文中的引文推荐研究论文方面取得了良好的进展。引用接近度分析使用文本中的引用接近度来发现两篇研究论文之间的相关性,比共引分析和书目分析更好。然而,每种方法都存在一个共同的问题,那就是论文应该被很好地引用。如果文件没有正确引用或根本没有引用,那么使用这些方法将无济于事。为了克服这个问题,本文讨论了虚拟引文邻近 (VCP) 方法,它使用连体神经网络以及引文邻近分析和基于内容的过滤的概念。为了训练这个模型,文档中两次引用之间的实际距离被用作基本事实,这个距离是两个引用之间的字数。VCP 在维基百科文章上进行训练,这些文章的实际字数可用,用于计算文档之间的相似度。这可用于计算两个文档之间的相关性,即使这些文档未被引用,它们也会在附近被引用。与使用平均引文邻近度指数值作为基本事实的方法相比,这种方法在预测与基本神经网络的邻近度方面取得了很大的进步。这可以通过使用复杂的神经网络和适当的参数超调来改善。这可用于计算两个文档之间的相关性,即使这些文档未被引用,它们也会在附近被引用。与使用平均引文邻近度指数值作为基本事实的方法相比,这种方法在预测与基本神经网络的邻近度方面取得了很大的进步。这可以通过使用复杂的神经网络和适当的参数超调来改善。这可用于计算两个文档之间的相关性,即使这些文档未被引用,它们也会在附近被引用。与使用平均引文邻近度指数值作为基本事实的方法相比,这种方法在预测与基本神经网络的邻近度方面取得了很大的进步。这可以通过使用复杂的神经网络和适当的参数超调来改善。
更新日期:2020-09-29
down
wechat
bug