Two approaches for clustering algorithms with relational-based data,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two approaches for clustering algorithms with relational-based data
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2019-07-23 , DOI: 10.1007/s10115-019-01384-9
João C. Xavier-Junior , Anne M. P. Canuto , Luiz M. G. Gonçalves

It is well known that relational databases still play an important role for many companies around the world. For this reason, the use of data mining methods to discover knowledge in large relational databases has become an interesting research issue. In the context of unsupervised data mining, for instance, the conventional clustering algorithms cannot handle the particularities of the relational databases in an efficient way. There are some clustering algorithms for relational datasets proposed in the literature. However, most of these methods apply complex and/or specific procedures to handle the relational nature of data, or the relational-based methods do not capture the relational nature in an efficient way. Aiming to contribute to this important topic, in this paper, we will present two simple and generic approaches to handle relational-based data for clustering algorithms. One of them treats the relational data through the use of a hierarchical structure, while the second approach applies a weight structure based on relationship and attribute information. In presenting these two approaches, we aim to tackle relational-based dataset in a simple and efficient way, improving the efficiency of corporations that handle relational-based in the unsupervised data mining context. In order to evaluate the effectiveness of the presented approaches, a comparative analysis will be conducted, comparing the proposed approaches with some existing approaches and with a baseline approach. In all analyzed approaches, we will use two well-known types of clustering algorithms (agglomerative hierarchical and K-means). In order to perform this analysis, we will use two internal and one external clusters as validity measures.

中文翻译：

基于关系数据的聚类算法的两种方法

众所周知，关系数据库对于世界各地的许多公司仍然发挥着重要作用。因此，使用数据挖掘方法来发现大型关系数据库中的知识已成为一个有趣的研究问题。例如，在无监督数据挖掘的情况下，常规的聚类算法无法有效地处理关系数据库的特殊性。文献中提出了一些针对关系数据集的聚类算法。但是，这些方法中的大多数都应用复杂和/或特定的过程来处理数据的关系性质，或者基于关系的方法不能有效地捕获关系性质。为了对这个重要主题做出贡献，在本文中，我们将介绍两种简单通用的方法来为聚类算法处理基于关系的数据。其中一种方法是通过使用层次结构来处理关系数据，而第二种方法是基于关系和属性信息应用权重结构。在介绍这两种方法时，我们旨在以一种简单有效的方式处理基于关系的数据集，从而提高在无人监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和其中一种方法是通过使用层次结构来处理关系数据，而第二种方法是基于关系和属性信息应用权重结构。在介绍这两种方法时，我们旨在以一种简单有效的方式处理基于关系的数据集，从而提高在无人监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和其中一种方法是通过使用层次结构来处理关系数据，而第二种方法则基于关系和属性信息应用权重结构。在介绍这两种方法时，我们旨在以一种简单有效的方式处理基于关系的数据集，从而提高在无人监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和而第二种方法则基于关系和属性信息应用权重结构。在介绍这两种方法时，我们旨在以一种简单有效的方式处理基于关系的数据集，从而提高在无人监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和而第二种方法则基于关系和属性信息应用权重结构。在介绍这两种方法时，我们旨在以一种简单有效的方式处理基于关系的数据集，从而提高在无人监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和提高了在无监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和提高了在无监督的数据挖掘环境中处理基于关系的公司的效率。为了评估所提出方法的有效性，将进行比较分析，将所提出的方法与一些现有方法和基准方法进行比较。在所有分析的方法中，我们将使用两种众所周知的聚类算法（聚集层次和K-表示）。为了执行此分析，我们将使用两个内部和一个外部群集作为有效性度量。

更新日期：2019-07-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11