Massively Parallel Correlation Clustering in Bounded Arboricity Graphs,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Massively Parallel Correlation Clustering in Bounded Arboricity Graphs
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-02-23 , DOI: arxiv-2102.11660
Mélanie Cambus, Davin Choo, Havu Miikonen, Jara Uitto

Identifying clusters of similar elements in a set is a common objective in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the vertices correspond to the elements and each edge is either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph $G$ on $n$ vertices such that the positive edges induce a $\lambda$-arboric graph. Our main result is a 3-approximation (\emph{in expectation}) algorithm that runs in $\mathcal{O}(\log \lambda \cdot \log \log n)$ MPC rounds in the \emph{sublinear memory regime}. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the \PIVOT algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with \emph{worst case} $(1+\epsilon)$-approximation guarantees in the special case of forests, where $\lambda=1$.

中文翻译：

有界树状图中的大规模并行相关聚类

识别集合中相似元素的群集是数据分析的共同目标。随着数据的巨大增长和单处理器速度上的物理限制，有必要找到用于群集任务的高效并行算法。在本文中，我们针对大规模并行计算（MPC）模型研究有界树状图中的相关性聚类问题。更具体地说，我们给出一个完整的图，其中顶点对应于元素，并且每个边为正或负，指示成对的顶点是相似还是相异。任务是将顶点划分为簇，并尽可能减少分歧。也就是说，我们要最小化集群内部正边缘和集群内部负边缘的数量。考虑在$ n $顶点上的输入图$ G $，使得正边沿产生$ \ lambda $-树状图。我们的主要结果是在$ \ mathcal {O}（\ log \ lambda \ cdot \ log \ log n）$ MPC轮次在\ emph {sublinear记忆体中）运行的3次逼近（\ emph {in期望）}算法}。这是通过将有界树状图上相关性聚类的结构属性与Fischer和Noever（SODA '18）对随机贪婪MIS的见解以及Ailon，Charikar和Newman的\ PIVOT算法（STOC '05）相结合而获得的。结合已知的图匹配算法，我们的结构属性还暗示了精确的算法和具有\ emph {最坏情况} $（1+ \ epsilon）$的近似算法（在特殊情况下为$ \ lambda = 1 $的森林）。我们的主要结果是在$ \ mathcal {O}（\ log \ lambda \ cdot \ log \ log n）$ MPC轮次在\ emph {sublinear记忆体中）运行的3次逼近（\ emph {in期望）}算法}。这是通过将有界树状图上相关性聚类的结构属性与Fischer和Noever（SODA '18）对随机贪婪MIS的见解以及Ailon，Charikar和Newman的\ PIVOT算法（STOC '05）相结合而获得的。结合已知的图匹配算法，我们的结构属性还暗示了精确的算法和具有\ emph {最坏情况} $（1+ \ epsilon）$的近似算法（在特殊情况下为$ \ lambda = 1 $的森林）。我们的主要结果是在$ \ mathcal {O}（\ log \ lambda \ cdot \ log \ log n）$ MPC轮次在\ emph {sublinear记忆体中）运行的3次逼近（\ emph {in期望）}算法}。这是通过将有界树状图上相关性聚类的结构属性与Fischer和Noever（SODA '18）对随机贪婪MIS的见解以及Ailon，Charikar和Newman的\ PIVOT算法（STOC '05）相结合而获得的。结合已知的图匹配算法，我们的结构属性还暗示了精确的算法和具有\ emph {最坏情况} $（1+ \ epsilon）$的近似算法（在特殊情况下为$ \ lambda = 1 $的森林）。这是通过将有界树状图上相关性聚类的结构属性与Fischer和Noever（SODA '18）对随机贪婪MIS的见解以及Ailon，Charikar和Newman的\ PIVOT算法（STOC '05）相结合而获得的。结合已知的图匹配算法，我们的结构属性还暗示了精确的算法和具有\ emph {最坏情况} $（1+ \ epsilon）$的近似算法（在特殊情况下为$ \ lambda = 1 $的森林）。这是通过将有界树状图上相关性聚类的结构属性与Fischer和Noever（SODA '18）对随机贪婪MIS的见解以及Ailon，Charikar和Newman的\ PIVOT算法（STOC '05）相结合而获得的。结合已知的图匹配算法，我们的结构属性还暗示了精确的算法和具有\ emph {最坏情况} $（1+ \ epsilon）$的近似算法（在特殊情况下为$ \ lambda = 1 $的森林）。

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>