当前位置: X-MOL 学术J. Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic construction of academic profile: A case of information science domain
Journal of Information Science ( IF 2.4 ) Pub Date : 2021-03-15 , DOI: 10.1177/0165551521998048
Qian Geng 1, 2 , Ziang Chuai , Jian Jin 2
Affiliation  

To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.



中文翻译:

自动建立学术档案:以信息科学领域为例

为了有效地为初级研究人员提供特定领域的概念,需要一种自动进行学术概况分析的方法。首先,要获得给定学者的个人记录,典型的受监督方法通常利用信息框之类的结构化数据在Wikipedia中作为训练数据集,但是当它们直接用于训练模型时,可能会导致严重的标签错误问题。为了解决这个问题,提出了一种用于细粒度实体类型的新的关系嵌入方法,该方法基于实体和关系的语义距离,考虑了实体的初始向量和新的惩罚方案。此外,为了突出与知名学者相关的关键概念,通过基于逻辑回归,AdaBoost算法和等级学习技术的新提出的提取方法,对包含大量学术术语的学者的选择性书目进行了分析。它弥合了传统的有监督方法只能返回二进制分类结果而无法帮助研究人员了解所选概念的相对重要性的差距。关于学术分析和相应基准数据集的实验类别表明,提出的方法明显优于现有方法。所提出的技术为初级研究人员提供了一种在特定领域中获取组织知识的自动方式,包括学者的背景信息和特定领域的概念。

更新日期:2021-03-15
down
wechat
bug