The δ-Machine: Classification Based on Distances Towards Prototypes,Journal of Classification

当前位置： X-MOL 学术 › J. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The δ-Machine: Classification Based on Distances Towards Prototypes
Journal of Classification ( IF 1.8 ) Pub Date : 2019-08-22 , DOI: 10.1007/s00357-019-09338-0
Beibei Yuan , Willem Heiser , Mark de Rooij

We introduce the δ-machine, a statistical learning tool for classification based on (dis)similarities between profiles of the observations to profiles of a representation set consisting of prototypes. In this article, we discuss the properties of the δ-machine, propose an automatic decision rule for deciding on the number of clusters for the K-means method on the predictive perspective, and derive variable importance measures and partial dependence plots for the machine. We performed five simulation studies to investigate the properties of the δ-machine. The first three simulation studies were conducted to investigate selection of prototypes, different (dis)similarity functions, and the definition of representation set. Results indicate that we best use the Lasso to select prototypes, that the Euclidean distance is a good dissimilarity function, and that finding a small representation set of prototypes gives sparse but competitive results. The remaining two simulation studies investigated the performance of the δ-machine with imbalanced classes and with unequal covariance matrices for the two classes. The results obtained show that the δ-machine is robust to class imbalances, and that the four (dis)similarity functions had the same performance regardless of the covariance matrices. We also showed the classification performance of the δ-machine compared with three other classification methods on ten real datasets from UCI database, and discuss two empirical examples in detail.

中文翻译：

δ-Machine：基于距离原型的分类

我们介绍了 δ-machine，这是一种统计学习工具，用于基于观察的配置文件与由原型组成的表示集的配置文件之间的（不同）相似性进行分类。在这篇文章中，我们讨论了 δ 机器的特性，提出了一种自动决策规则，用于从预测角度决定 K 均值方法的聚类数，并推导出机器的变量重要性度量和部分依赖图。我们进行了五次模拟研究来研究 δ 机的特性。前三个模拟研究是为了研究原型的选择、不同的（非）相似性函数和表示集的定义。结果表明我们最好使用 Lasso 来选择原型，欧几里得距离是一个很好的相异函数，并且找到一个小的原型表示集会给出稀疏但有竞争力的结果。剩下的两个模拟研究调查了具有不平衡类和两个类的不等协方差矩阵的 δ 机器的性能。获得的结果表明 δ 机器对类别不平衡具有鲁棒性，并且无论协方差矩阵如何，四个（非）相似性函数都具有相同的性能。我们还在来自 UCI 数据库的十个真实数据集上展示了 δ-machine 与其他三种分类方法的分类性能，并详细讨论了两个经验示例。剩下的两个模拟研究调查了具有不平衡类和两个类的协方差矩阵不相等的 δ 机器的性能。获得的结果表明 δ 机器对类别不平衡具有鲁棒性，并且无论协方差矩阵如何，四个（非）相似性函数都具有相同的性能。我们还在来自 UCI 数据库的十个真实数据集上展示了 δ-machine 与其他三种分类方法的分类性能，并详细讨论了两个经验示例。剩下的两个模拟研究调查了具有不平衡类和两个类的不等协方差矩阵的 δ 机器的性能。获得的结果表明 δ 机器对类别不平衡具有鲁棒性，并且无论协方差矩阵如何，四个（非）相似性函数都具有相同的性能。我们还展示了 δ-machine 在来自 UCI 数据库的十个真实数据集上与其他三种分类方法相比的分类性能，并详细讨论了两个经验示例。

更新日期：2019-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11