当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exact Acceleration of K-Means++ and K-Means$\|$
arXiv - CS - Mathematical Software Pub Date : 2021-05-06 , DOI: arxiv-2105.02936
Edward Raff

K-Means++ and its distributed variant K-Means$\|$ have become de facto tools for selecting the initial seeds of K-means. While alternatives have been developed, the effectiveness, ease of implementation, and theoretical grounding of the K-means++ and $\|$ methods have made them difficult to "best" from a holistic perspective. By considering the limited opportunities within seed selection to perform pruning, we develop specialized triangle inequality pruning strategies and a dynamic priority queue to show the first acceleration of K-Means++ and K-Means$\|$ that is faster in run-time while being algorithmicly equivalent. For both algorithms we are able to reduce distance computations by over $500\times$. For K-means++ this results in up to a 17$\times$ speedup in run-time and a $551\times$ speedup for K-means$\|$. We achieve this with simple, but carefully chosen, modifications to known techniques which makes it easy to integrate our approach into existing implementations of these algorithms.

中文翻译:

精确加速K-Means ++和K-Means $ \ | $

K-Means ++及其分布式变体K-Means $ \ | $已经成为选择K-means初始种子的事实上的工具。尽管已经开发了替代方法,但是K-means ++和$ \ | $方法的有效性,易于实现以及其理论基础使它们难以从整体角度“最佳”。通过考虑在种子选择中执行修剪的机会有限,我们开发了专门的三角不等式修剪策略和动态优先级队列,以显示K-Means ++和K-Means $ \ | $的首次加速,运行时速度更快,而K-Means ++和K-Means $ \ | $在算法上等效。对于这两种算法,我们都能够将距离计算减少500美元以上。对于K-means ++,这将使运行时的速度提高17 $ \ times $,K-means $ \ | $的速度提高$ 551 \ times $。
更新日期:2021-05-10
down
wechat
bug