当前位置: X-MOL 学术IEEE Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data Cleaning for Personal Credit Scoring by Utilizing Social Media Data: An Empirical Study
IEEE Intelligent Systems ( IF 6.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/mis.2020.2972214
Xi Yu , Qi Yang , Ruiqi Wang , Runqing Fang , Mingsen Deng

With the accumulation of data on personal behavior and the development of machine learning models and algorithms, it is becoming possible to use social media data for personal credit scoring. In this article, we use the systematic sampling method to obtain Douban's social media data. Because there are many abnormal users in these data, they are “real but false data” for personal credit evaluation. In order to better carry out personal credit scoring, we propose three criteria, power exponents of time interval distribution of individual user $\gamma _i$γi, user activity $A_i$Ai, and the ratio of out-degree and in-degree $R_i$Ri of user $i$i, which are used to systematically clean the data. And then, we used the logistic regression method to score the individual credits of users before and after data cleaning, and found that the rank order of personal credit scoring has changed significantly. This change is largely attributed to the changes of network structure after data cleaning. We believe that our work is very important to use the social media data to establish a credible personal credit evaluation system to reduce the credit risk of the current Internet financial industry.

中文翻译:

利用社交媒体数据清理个人信用评分数据:一项实证研究

随着个人行为数据的积累以及机器学习模型和算法的发展,使用社交媒体数据进行个人信用评分变得可能。在本文中,我们使用系统抽样的方法来获取豆瓣的社交媒体数据。由于这些数据中存在大量异常用户,对于个人信用评估来说,它们是“真实但虚假的数据”。为了更好地进行个人信用评分,我们提出了三个标准,个人用户时间间隔分布的幂指数$\gamma_i$γi,用户活跃度$A_i$Ai,出度和入度的比值$用户 $i$i 的 R_i$Ri,用于系统地清理数据。然后,我们使用逻辑回归方法对数据清洗前后用户的个人信用进行评分,并发现个人信用评分的排名顺序发生了显着变化。这种变化很大程度上归因于数据清洗后网络结构的变化。我们认为,利用社交媒体数据建立可信的个人信用评价体系,降低当前互联网金融行业的信用风险,我们的工作非常重要。
更新日期:2020-01-01
down
wechat
bug