Hidden features identification for designing an efficient research article recommendation system,International Journal on Digital Libraries

当前位置： X-MOL 学术 › International Journal on Digital Libraries › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hidden features identification for designing an efficient research article recommendation system
International Journal on Digital Libraries ( IF 1.6 ) Pub Date : 2021-04-30 , DOI: 10.1007/s00799-021-00301-2
Arpita Chaudhuri , Nilanjan Sinhababu , Monalisa Sarma , Debasis Samanta

The digital repository of research articles is increasing at a rapid rate and hence searching the right paper becoming a tedious task for researchers. A research paper recommendation system is advocated to help researchers in this context. In the process of designing such a system, proper representation of articles, more specifically, feature identification and extraction are two essential tasks. The existing approaches mainly consider direct features which are readily available from research articles. However, there are certain features which are not readily available from a paper, but may greatly influence the performance of recommendation systems. This paper proposes four indirect features: keyword diversification, text complexity, citation analysis over time, and scientific quality measurement to represent a research article. The keyword diversification measures the uniqueness of the keywords of a paper which helps variation in recommendation. The text complexity measurement helps to provide a paper by matching the user’s understandability level. The citation analysis over time decides the relevancy of a paper. The scientific quality measurement helps to measure the scientific values of papers. Formal definitions of the proposed indirect features, schemes to extract the feature values given a research article, and metrics to measure them quantitatively are discussed in this paper. To substantiate the efficacy of the proposed features, a number of experiments have been carried out. The experimental results reveal that the proposed indirect features uniquely define a research article than the direct features. Given a research paper, extraction of feature vector is computationally fast and thus feasible to filter a large corpus of papers in real time. More significantly, indirect features are matchable with user’s profile features, thus satisfying an important criterion in collaborative filtering.

中文翻译：

隐藏特征识别，用于设计高效的研究论文推荐系统

研究论文的数字存储库正在迅速增加，因此搜索正确的论文成为研究人员的繁琐任务。提倡一种研究论文推荐系统，以在这种情况下帮助研究人员。在设计这样的系统的过程中，正确表示商品，更具体地说，特征识别和提取是两项基本任务。现有的方法主要考虑直接的特征，这些特征可以从研究文章中容易地获得。但是，有些功能不是很容易从论文中获得，但可能会极大地影响推荐系统的性能。本文提出了四个间接特征：关键字的多样化，文本的复杂性，一段时间内的引文分析以及代表研究论文的科学质量度量。关键字多样化可衡量论文关键字的唯一性，从而有助于推荐的变化。文本复杂度测量通过匹配用户的可理解程度来帮助提供论文。随着时间的推移，引文分析决定了论文的相关性。科学的质量衡量有助于衡量论文的科学价值。本文讨论了提出的间接特征的形式定义，给定研究文章的提取特征值的方案以及定量测量特征的度量。为了证实所提出特征的功效，已经进行了许多实验。实验结果表明，所提出的间接特征比直接特征唯一地定义了一篇研究文章。有了研究论文，特征向量的提取在计算上是快速的，因此可以实时地过滤大量的论文集。更重要的是，间接特征可与用户的个人资料特征相匹配，从而满足了协同过滤中的重要标准。

更新日期：2021-04-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文