Massive-scale graph-clustering-based data management based on multi-metrics,Future Generation Computer Systems

当前位置： X-MOL 学术 › Future Gener. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Massive-scale graph-clustering-based data management based on multi-metrics
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2021-01-21 , DOI: 10.1016/j.future.2020.12.018
Su Hu

With the explosive growth in the amount of data nowadays, we have entered the era of the “big data”. In this circumstance, it is becoming more and more significant to effectively and efficiently managing such massive-scale data. Traditionally, data are managed by only one single and unified metric, which might be too restrict due to the incapability to represent their social attributes (i.e., each sample and its socially-connected neighbors in the feature space). To tackle this problem, we propose a multi-metric framework for data similarity management, wherein the multiple metrics are derived through a dense subgraph mining algorithm. More specifically, for each sample, we first extract multi-channel features to characterize it, based on which multiple massive-scale affinity graphs can be constructed accordingly. Afterward, for affinity graph from each feature channel, we discover the densely distributed subgraphs on it. Thereby, each sample’s social network can be obtained and the metric can be defined accordingly. Finally, we can fuse the multiple metrics corresponding to social networks from different feature channels. In this manner, we obtain the so-called multi-metric similarity measure, which is socially-aware and optimally fuses multi-channel features. Comprehensive experimental results on six publicly available data sets have demonstrated the competitiveness of our proposed multi-metric in massive-scale data classification and retrieval.

中文翻译：

基于多指标的大规模图聚类数据管理

如今，随着数据量的爆炸性增长，我们进入了“大数据”时代。在这种情况下，有效地管理这样的大规模数据变得越来越重要。传统上，数据仅由一个统一的度量标准管理，由于无法表示其社会属性（即，特征空间中的每个样本及其与社会相关联的邻居），该数据可能过于受限。为了解决这个问题，我们提出了一种用于数据相似性管理的多指标框架，其中多个指标是通过密集子图挖掘算法导出的。更具体地说，对于每个样本，我们首先提取多通道特征以对其进行表征，基于此可以相应地构建多个大规模亲和图。之后，对于来自每个特征通道的亲和图，我们发现其上分布密集的子图。从而，可以获得每个样本的社交网络，并且可以相应地定义度量。最后，我们可以融合来自不同功能渠道的对应于社交网络的多个指标。通过这种方式，我们获得了所谓的多度量相似性度量，该度量具有社会意识，并且可以最佳地融合多通道功能。对六个公开可用数据集的综合实验结果表明，我们提出的多指标在大规模数据分类和检索中具有竞争力。我们可以融合来自不同功能渠道的对应于社交网络的多个指标。通过这种方式，我们获得了所谓的多度量相似性度量，该度量具有社会意识，并且可以最佳地融合多通道功能。对六个公开可用数据集的综合实验结果表明，我们提出的多指标在大规模数据分类和检索中具有竞争力。我们可以融合来自不同功能渠道的对应于社交网络的多个指标。通过这种方式，我们获得了所谓的多度量相似性度量，该度量具有社会意识，并且可以最佳地融合多通道功能。对六个公开可用数据集的综合实验结果表明，我们提出的多指标在大规模数据分类和检索中具有竞争力。

更新日期：2021-02-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>