Learning a metric when clustering data points in the presence of constraints,Advances in Data Analysis and Classification

当前位置： X-MOL 学术 › Adv. Data Anal. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning a metric when clustering data points in the presence of constraints
Advances in Data Analysis and Classification ( IF 1.4 ) Pub Date : 2019-05-16 , DOI: 10.1007/s11634-019-00359-6
Ahmad Ali Abin , Mohammad Ali Bashiri , Hamid Beigy

Learning an appropriate distance measure under supervision of side information has become a topic of significant interest within machine learning community. In this paper, we address the problem of metric learning for constrained clustering by considering three important issues: (1) considering importance degree for constraints, (2) preserving the topological structure of data, and (3) preserving some natural distribution properties in the data. This work provides a unified way to handle different issues in constrained clustering by learning an appropriate distance measure. It has modeled the first issue by injecting the importance degree of constraints directly into an objective function. The topological structure of data is preserved by minimizing the reconstruction error of data in the target space. Finally we addressed the issue of preserving natural distribution properties in the data by using the proximity information of data. We have proposed two different methods to address the above mentioned issues. The first approach learns a linear transformation of data into a target space (linear-model) and the second one uses kernel functions to learn an appropriate distance measure (non-linear-model). Experiments show that considering these issues significantly improves clustering accuracy.

中文翻译：

在存在约束的情况下对数据点进行聚类时学习度量

在辅助信息的监督下学习适当的距离度量已成为机器学习社区中非常感兴趣的主题。在本文中，我们通过考虑以下三个重要问题来解决约束聚类的度量学习问题：（1）考虑约束的重要程度；（2）保留数据的拓扑结构；（3）保留约束中的某些自然分布属性数据。这项工作提供了一种统一的方式，通过学习适当的距离度量来处理约束聚类中的不同问题。它通过将约束的重要程度直接注入目标函数来对第一个问题进行建模。通过最小化目标空间中数据的重构误差来保留数据的拓扑结构。最后，我们通过使用数据的邻近性信息解决了在数据中保留自然分布属性的问题。我们提出了两种不同的方法来解决上述问题。第一种方法学习数据到目标空间的线性转换（线性模型），第二种方法使用核函数学习适当的距离度量（非线性模型）。实验表明，考虑这些问题可以显着提高聚类精度。

更新日期：2019-05-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11