当前位置: X-MOL 学术Statistics and Public Policy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using First Name Information to Improve Race and Ethnicity Classification
Statistics and Public Policy Pub Date : 2018-01-01 , DOI: 10.1080/2330443x.2018.1427012
Ioan Voicu 1
Affiliation  

ABSTRACT This article uses a recent first name list to develop an improvement to an existing Bayesian classifier, namely the Bayesian Improved Surname Geocoding (BISG) method, which combines surname and geography information to impute missing race/ethnicity. The new Bayesian Improved First Name Surname Geocoding (BIFSG) method is validated using a large sample of mortgage applicants who self-report their race/ethnicity. BIFSG outperforms BISG, in terms of accuracy and coverage, for all major racial/ethnic categories. Although the overall magnitude of improvement is somewhat small, the largest improvements occur for non-Hispanic Blacks, a group for which the BISG performance is weakest. When estimating the race/ethnicity effects on mortgage pricing and underwriting decisions with regression models, estimation biases from both BIFSG and BISG are very small, with BIFSG generally having smaller biases, and the maximum a posteriori classifier resulting in smaller biases than through use of estimated probabilities. Robustness checks using voter registration data confirm BIFSG's improved performance vis-a-vis BISG and illustrate BIFSG's applicability to areas other than mortgage lending. Finally, I demonstrate an application of the BIFSG to the imputation of missing race/ethnicity in the Home Mortgage Disclosure Act data, and in the process, offer novel evidence that the incidence of missing race/ethnicity information is correlated with race/ethnicity.

中文翻译:

使用名字信息来改善种族和种族分类

摘要本文使用最近的名字列表对现有的贝叶斯分类器进行了改进,即贝叶斯改进的姓氏地理编码(BISG)方法,该方法结合了姓氏和地理信息以估算缺少的种族/民族。新的贝叶斯改进的姓氏地理编码(BIFSG)方法已使用大量自我报告其种族/民族的抵押贷款申请人进行了验证。就所有主要种族/族裔类别而言,BIFSG在准确性和覆盖范围方面均优于BISG。尽管总体改善幅度不大,但非西班牙裔黑人的改善最大,这是BISG性能最弱的群体。当使用回归模型估算种族/种族对抵押贷款定价和承销决策的影响时,BIFSG和BISG的估计偏差都非常小,BIFSG通常具有较小的偏差,并且与使用估计概率相比,最大后验分类器导致的偏差较小。使用选民登记数据进行的稳健性检查确认了BIFSG相对于BISG的绩效得到了改善,并说明了BIFSG在抵押贷款之外的其他领域的适用性。最后,我演示了BIFSG在《房屋抵押揭露法》数据中对种族/民族缺失的归因的应用,并在此过程中提供了新颖的证据,表明种族/民族信息缺失的发生与种族/民族相关。使用选民登记数据进行的稳健性检查确认了BIFSG相对于BISG的绩效得到了改善,并说明了BIFSG在抵押贷款之外的其他领域的适用性。最后,我演示了BIFSG在《房屋抵押揭露法》数据中对种族/民族缺失的归因的应用,并在此过程中提供了新颖的证据,表明种族/民族信息缺失的发生与种族/民族相关。使用选民登记数据进行的稳健性检查确认了BIFSG相对于BISG的绩效得到了改善,并说明了BIFSG在抵押贷款之外的其他领域的适用性。最后,我演示了BIFSG在《房屋抵押揭露法》数据中对种族/民族缺失的归因的应用,并在此过程中提供了新颖的证据,表明种族/民族信息缺失的发生与种族/民族相关。
更新日期:2018-01-01
down
wechat
bug