当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identifying author heritage using surname data: An application for russian surnames
Journal of the Association for Information Science and Technology ( IF 2.8 ) Pub Date : 2019-01-25 , DOI: 10.1002/asi.24104
Maria Karaulova 1 , Abdullah Gök 2 , Philip Shapira 1, 3
Affiliation  

This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary‐based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname‐based identification method and applied it to infer Russian heritage from suffix‐based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first‐name data to eliminate false‐positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long‐standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.

中文翻译:

使用姓氏数据识别作者遗产:俄罗斯姓氏的应用

本文提出了一种基于姓氏形态识别作者民族遗产的方法。该领域的大多数研究使用基于字典的姓氏方法的变体来识别种族社区,这种方法受到方法论的限制。使用 2015 年 ORCID(开放研究人员和贡献者 ID)标识符的公共文件,我们开发了一种基于姓氏的识别方法,并将其应用于从基于后缀的形态规律中推断俄罗斯遗产。该方法是在概念上开发的,并在欠采样的控制集中进行了测试。然后通过使用名字数据来补充基于姓氏形态的识别,以消除假阳性结果。该方法实现了 98% 的准确率和 94% 的召回率——优于大多数使用名称数据的其他方法。该程序可适用于识别具有形态规则命名传统的各种民族群体的遗产。我们详细说明了如何使用该方法来克服在文献计量数据集中使用名称数据的长期限制。这种识别方法有助于推进科学流动和迁移、某些团体的专利申请、出版和合作、跨国和科学侨民联系以及多样性对组织、地区和国家创新绩效的影响方面的研究。
更新日期:2019-01-25
down
wechat
bug