Fair Clustering with Fair Correspondence Distribution,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fair Clustering with Fair Correspondence Distribution
Information Sciences Pub Date : 2021-09-08 , DOI: 10.1016/j.ins.2021.09.010
Woojin Lee ₁ , Hyungjin Ko ₂ , Junyoung Byun ₂ , Taeho Yoon ₃ , Jaewook Lee ₂

Affiliation

In recent years, the issue of fairness has become important in the field of machine learning. In clustering problems, fairness is defined in terms of consistency in that the balance ratio of data with different sensitive attribute values remains constant for each cluster. Fairness problems are important in real-world applications, for example, when the recommendation system provides targeted advertisements or job offers based on the clustering result of candidates, the minority group may not get the same level of opportunity as the majority group if the clustering result is unfair. In this study, we propose a novel distribution-based fair clustering approach. Considering a distribution in which the sample is biased by society, we try to find clusters from a fair correspondence distribution. Our method uses the support vector method and a dynamical system to comprehensively divide the entire data space into atomic cells before reassembling them fairly to form the clusters. Theoretical results derive the upper bound of the generalization error of the corresponding clustering function in the fair correspondence distribution when atomic cells are connected fairly, allowing us to present an algorithm to achieve fairness. Experimental results show that our algorithm beneficially increases fairness while reducing computation time for various datasets.

中文翻译：

具有公平对应分布的公平聚类

近年来，公平问题在机器学习领域变得越来越重要。在聚类问题中，公平性是根据一致性来定义的，因为每个聚类具有不同敏感属性值的数据的平衡比率保持不变。公平问题在现实世界的应用中很重要，例如，当推荐系统根据候选人的聚类结果提供有针对性的广告或工作机会时，如果聚类结果，少数群体可能无法获得与多数群体相同水平的机会是不公平的。在这项研究中，我们提出了一种新的基于分布的公平聚类方法。考虑到样本受到社会偏见的分布，我们尝试从公平的对应分布中找到集群。我们的方法使用支持向量方法和动态系统将整个数据空间全面划分为原子单元，然后公平地重新组装它们以形成集群。理论结果推导出了当原子单元公平连接时公平对应分布中相应聚类函数的泛化误差的上限，使我们能够提出一种实现公平的算法。实验结果表明，我们的算法有益地提高了公平性，同时减少了各种数据集的计算时间。理论结果推导出了当原子单元公平连接时公平对应分布中相应聚类函数的泛化误差的上限，使我们能够提出一种实现公平的算法。实验结果表明，我们的算法有益地提高了公平性，同时减少了各种数据集的计算时间。理论结果推导出了当原子单元公平连接时公平对应分布中相应聚类函数的泛化误差的上限，使我们能够提出一种实现公平的算法。实验结果表明，我们的算法有益地提高了公平性，同时减少了各种数据集的计算时间。

更新日期：2021-09-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11