当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Context-sensitive gender inference of named entities in text
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-11-11 , DOI: 10.1016/j.ipm.2020.102423
Sudeshna Das , Jiaul H Paik

The gender information of named entities is an important prerequisite for many text analysis tasks such as gender bias detection and targeted advertising. Despite its valuable use cases, gender tagging of named entities has traditionally been database-reliant. The lack of open-source benchmarks is a major impediment to exploring the effectiveness of machine learning-based methods for this task. Towards this goal, the article serves two main purposes. Firstly, we create four open-source datasets from well-known NER corpora and make them publicly available. Secondly, we propose a novel supervised learning approach based on the transformer network to identify the gender of named entities. We evaluate the proposed approach on four gender identification datasets. The proposed method outperforms two commercial database-reliant approaches and five deep sequence models, including BERT.



中文翻译:

文本中命名实体的上下文相关性别推断

命名实体的性别信息是许多文本分析任务(如性别偏见检测和定向广告)的重要前提。尽管有有价值的用例,但命名实体的性别标记传统上是依赖数据库的。缺乏开源基准测试是探索基于机器学习的方法来完成此任务的有效性的主要障碍。为了实现这一目标,本文有两个主要目的。首先,我们从著名的NER语料库创建四个开源数据集,并将其公开提供。其次,我们提出了一种基于变压器网络的新颖的监督学习方法,以识别命名实体的性别。我们在四个性别识别数据集上评估了提出的方法。

更新日期:2020-11-12
down
wechat
bug