Fair Clustering Under a Bounded Cost,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fair Clustering Under a Bounded Cost
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-06-14 , DOI: arxiv-2106.07239
Seyed A. Esmaeili, Brian Brubach, Aravind Srinivasan, John P. Dickerson

Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness. In this model, the cost of the clustering objective increases due to enforcing fairness in the algorithm. The relative increase in the cost, the ''price of fairness,'' can indeed be unbounded. Therefore, in this paper we propose to treat an upper bound on the clustering objective as a constraint on the clustering problem, and to maximize equality of representation subject to it. We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective. We derive fundamental lower bounds on the approximation of the utilitarian and egalitarian objectives and introduce algorithms with provable guarantees for them. For the leximin objective we introduce an effective heuristic algorithm. We further derive impossibility results for other natural fairness objectives. We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms.

中文翻译：

有限成本下的公平聚类

聚类是一个基本的无监督学习问题，其中数据集被划分为由度量空间中的附近点组成的集群。最近的一种变体公平聚类将一种颜色与代表其组成员资格的每个点相关联，并要求每种颜色在每个聚类中具有（大约）相等的表示以满足组公平性。在该模型中，由于在算法中强制执行公平性，聚类目标的成本会增加。成本的相对增加，即“公平的价格”，确实可以是无限的。因此，在本文中，我们建议将聚类目标的上限视为对聚类问题的约束，并最大限度地提高表示的平等性。我们考虑两个公平目标：群体功利主义目标和群体平等主义目标，以及概括群体平等主义目标的群体词汇目标。我们推导出功利主义和平等主义目标的近似值的基本下界，并为它们引入具有可证明保证的算法。对于 leximin 目标，我们引入了一种有效的启发式算法。我们进一步推导出其他自然公平目标的不可能结果。我们以真实世界数据集的实验结果作为结论，证明了我们算法的有效性。对于 leximin 目标，我们引入了一种有效的启发式算法。我们进一步推导出其他自然公平目标的不可能结果。我们以真实世界数据集的实验结果作为结论，证明了我们算法的有效性。对于 leximin 目标，我们引入了一种有效的启发式算法。我们进一步推导出其他自然公平目标的不可能结果。我们以真实世界数据集的实验结果作为结论，证明了我们算法的有效性。

更新日期：2021-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文