当前位置: X-MOL 学术arXiv.cs.CG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximating $(k,\ell)$-Median Clustering for Polygonal Curves
arXiv - CS - Computational Geometry Pub Date : 2020-09-03 , DOI: arxiv-2009.01488
Maike Buchin and Anne Driemel and Dennis Rohde

In 2015, Driemel, Krivo\v{s}ija and Sohler introduced the $(k,\ell)$-median problem for clustering polygonal curves under the Fr\'echet distance. Given a set of input curves, the problem asks to find $k$ median curves of at most $\ell$ vertices each that minimize the sum of Fr\'echet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in $\mathbb{R}^d$ and achieves approximation factor $(1+\epsilon)$ with respect to the clustering costs. The algorithm has worst-case running-time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in $d$, $\ell$, $\epsilon$ and $\delta$, i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity $\ell$, but of complexity at most $2\ell-2$, and whose vertices can be computed efficiently. We combine this lemma with the superset-sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.

中文翻译:

多边形曲线的近似 $(k,\ell)$-中值聚类

2015 年,Driemel、Krivo\v{s}ija 和 Sohler 引入了 $(k,\ell)$-median 问题,用于在 Fr\'echet 距离下聚类多边形曲线。给定一组输入曲线,该问题要求找到至多 $\ell$ 个顶点的 $k$ 中值曲线,每个顶点最小化所有输入曲线上的 Fr\'echet 距离之和到它们最近的中值曲线。他们算法的一个主要缺点是输入曲线被限制在实线上。在本文中,我们提出了一种随机双准则近似算法,该算法适用于 $\mathbb{R}^d$ 中的多边形曲线,并针对聚类成本实现近似因子 $(1+\epsilon)$。该算法的最坏情况运行时间与曲线数量呈线性关系,每条曲线的最大顶点数呈多项式,即它们的复杂度,以及指数形式的 $d$、$\ell$、$\epsilon$ 和 $\delta$,即失败概率。我们通过一个捷径引理来实现这个结果,它保证了一个多边形曲线的存在,其成本与复杂度为 $\ell$ 的最优中值曲线相似,但复杂度最多为 $2\ell-2$,并且其顶点可以计算有效率的。我们将这个引理与 Kumar 等人的超集采样技术结合起来。推导出我们的聚类结果。在这样做时,我们描述和分析了 Ackermann 等人对算法的概括,这可能是独立的兴趣。并且可以有效地计算其顶点。我们将这个引理与 Kumar 等人的超集采样技术结合起来。推导出我们的聚类结果。在这样做时,我们描述和分析了 Ackermann 等人对算法的概括,这可能是独立的兴趣。并且可以有效地计算其顶点。我们将这个引理与 Kumar 等人的超集采样技术结合起来。推导出我们的聚类结果。在这样做时,我们描述和分析了 Ackermann 等人对算法的概括,这可能是独立的兴趣。
更新日期:2020-11-04
down
wechat
bug