Approximating $(k,\ell)$-Median Clustering for Polygonal Curves,arXiv - CS - Computational Geometry

当前位置： X-MOL 学术 › arXiv.cs.CG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Approximating $(k,\ell)$-Median Clustering for Polygonal Curves
arXiv - CS - Computational Geometry Pub Date : 2020-09-03 , DOI: arxiv-2009.01488
Maike Buchin and Anne Driemel and Dennis Rohde

In 2015, Driemel, Krivo\v{s}ija and Sohler introduced the $(k,\ell)$-median problem for clustering polygonal curves under the Fr\'echet distance. Given a set of input curves, the problem asks to find $k$ median curves of at most $\ell$ vertices each that minimize the sum of Fr\'echet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in $\mathbb{R}^d$ and achieves approximation factor $(1+\epsilon)$ with respect to the clustering costs. The algorithm has worst-case running-time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in $d$, $\ell$, $\epsilon$ and $\delta$, i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity $\ell$, but of complexity at most $2\ell-2$, and whose vertices can be computed efficiently. We combine this lemma with the superset-sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.

中文翻译：

多边形曲线的近似 $(k,\ell)$-中值聚类

2015 年，Driemel、Krivo\v{s}ija 和 Sohler 引入了 $(k,\ell)$-median 问题，用于在 Fr\'echet 距离下聚类多边形曲线。给定一组输入曲线，该问题要求找到至多 $\ell$ 个顶点的 $k$ 中值曲线，每个顶点最小化所有输入曲线上的 Fr\'echet 距离之和到它们最近的中值曲线。他们算法的一个主要缺点是输入曲线被限制在实线上。在本文中，我们提出了一种随机双准则近似算法，该算法适用于 $\mathbb{R}^d$ 中的多边形曲线，并针对聚类成本实现近似因子 $(1+\epsilon)$。该算法的最坏情况运行时间与曲线数量呈线性关系，每条曲线的最大顶点数呈多项式，即它们的复杂度，以及指数形式的 $d$、$\ell$、$\epsilon$ 和 $\delta$，即失败概率。我们通过一个捷径引理来实现这个结果，它保证了一个多边形曲线的存在，其成本与复杂度为 $\ell$ 的最优中值曲线相似，但复杂度最多为 $2\ell-2$，并且其顶点可以计算有效率的。我们将这个引理与 Kumar 等人的超集采样技术结合起来。推导出我们的聚类结果。在这样做时，我们描述和分析了 Ackermann 等人对算法的概括，这可能是独立的兴趣。并且可以有效地计算其顶点。我们将这个引理与 Kumar 等人的超集采样技术结合起来。推导出我们的聚类结果。在这样做时，我们描述和分析了 Ackermann 等人对算法的概括，这可能是独立的兴趣。并且可以有效地计算其顶点。我们将这个引理与 Kumar 等人的超集采样技术结合起来。推导出我们的聚类结果。在这样做时，我们描述和分析了 Ackermann 等人对算法的概括，这可能是独立的兴趣。

更新日期：2020-11-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文