当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
What’s in a name? – gender classification of names with character based machine learning models
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2021-05-12 , DOI: 10.1007/s10618-021-00748-6
Yifan Hu , Changwei Hu , Thanh Tran , Tejaswi Kasturi , Elizabeth Joseph , Matt Gillingham

Gender information is no longer a mandatory input when registering for an account at many leading Internet companies. However, prediction of demographic information such as gender and age remains an important task, especially in intervention of unintentional gender/age bias in recommender systems. Therefore it is necessary to infer the gender of those users who did not to provide this information during registration. We consider the problem of predicting the gender of registered users based on their declared name. By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings. We propose a number of character based machine learning models, and demonstrate that our models are able to infer the gender of users with much higher accuracy than baseline models. Moreover, we show that using the last names in addition to the first names improves classification performance further.



中文翻译:

名字叫什么?–使用基于字符的机器学习模型对姓名进行性别分类

在许多领先的互联网公司注册帐户时,性别信息不再是必填项。但是,预测诸如性别和年龄之类的人口统计信息仍然是一项重要的任务,尤其是在推荐系统中无意中出现性别/年龄偏见的干预中。因此,有必要推断出在注册期间未提供此信息的那些用户的性别。我们考虑根据注册用户声明的名称来预测其性别的问题。通过分析100M +用户的名字,我们发现可以使用名字字符串的组成对性别进行非常有效的分类。我们提出了许多基于字符的机器学习模型,并证明了我们的模型能够比基线模型准确得多地推断用户的性别。而且,

更新日期:2021-05-12
down
wechat
bug