当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2020-07-22 , DOI: 10.1007/s11634-020-00411-w
Kadri Umbleja , Manabu Ichino , Hiroyuki Yaguchi

Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.



中文翻译:

基于分位数方法的层次概念聚类在分布式数据中识别微观细节

符号数据是从较大的传统数据集中聚合的,以便隐藏特定于条目的详细信息,并能够分析大量数据(如大数据),否则将无法实现。符号数据可能以许多不同但复杂的形式出现,例如间隔和直方图。识别模式并找到对象之间的相似性是数据挖掘的最基本任务之一。为了准确地对这些复杂的数据类型进行聚类,仅靠常规方法是不够的。多年来,提出了不同的方法,但它们主要集中在对象之间的“宏观”相似性上。分布数据(例如符号数据)已从大量数据集中集合,因此,即使是最小的微观差异和相似性也变得极为重要。本文提出了一种利用分位数来基于这些微观相似性对分布数据进行聚类的方法。具有多个要比较的点,可以在分配的小部分中识别相似性,同时产生更充分的层次概念。提议的算法,称为微观层次概念聚类,具有单调性,并且在实验过程中发现可以产生更充分的概念聚类。此外,由于使用了分位数,该算法使我们能够轻松比较不同类型的符号数据,而不会增加任何复杂性。具有多个要比较的点,可以在分配的小部分中识别相似性,同时产生更充分的层次概念。提议的算法,称为微观层次概念聚类,具有单调性,并且在实验过程中发现可以产生更充分的概念聚类。此外,由于使用了分位数,该算法使我们能够轻松比较不同类型的符号数据,而不会增加任何复杂性。具有多个要比较的点,可以在分配的小部分中识别相似性,同时产生更充分的层次概念。提议的算法,称为微观层次概念聚类,具有单调性,并且在实验过程中发现可以产生更充分的概念聚类。此外,由于使用了分位数,该算法使我们能够轻松比较不同类型的符号数据,而不会增加任何复杂性。

更新日期:2020-07-22
down
wechat
bug