当前位置: X-MOL 学术J. Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation
Journal of Information Science ( IF 2.4 ) Pub Date : 2021-05-31 , DOI: 10.1177/01655515211018171
Jinseok Kim 1 , Jenna Kim 2 , Jinmo Kim 2
Affiliation  

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.

更新日期:2021-06-01
down
wechat
bug