当前位置: X-MOL 学术arXiv.cs.CG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Introduction to Core-sets: an Updated Survey
arXiv - CS - Computational Geometry Pub Date : 2020-11-18 , DOI: arxiv-2011.09384
Dan Feldman

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of $k$ centers. However, traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data", where every possible query has approximately the same answer on both data sets. Generic techniques enable efficient coreset \changed{maintenance} of streaming, distributed and dynamic data. Traditional algorithms can then be applied on these coresets to maintain the approximated optimal solutions. The challenge is to design coresets with provable tradeoff between their size and approximation error. This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art.

中文翻译:

核心集简介:更新的调查

在优化或机器学习问题中,我们得到一组项目,通常是某个度量空间中的点,目标是在某个候选解决方案空间上最小化或最大化目标函数。例如,在聚类问题中,输入是某个度量空间中的一组点,一个共同的目标是计算某个其他空间(点、线)中的一组中心,以最小化到这些点的距离总和。在数据库查询中,我们可能需要为 $k$ 中心的特定查询集计算这样的一些。然而,传统算法无法处理现代系统,这些系统需要对来自传感器(如 GPS、音频或视频)的无限分布式流进行并行实时计算,这些流到达云端,或由智能手机或机器人等较弱设备组成的网络。核心集是一个“小数据” 输入“大数据”的总结,其中每个可能的查询在两个数据集上都有大致相同的答案。通用技术可实现流式、分布式和动态数据的高效核心集\changed{maintenance}。然后可以将传统算法应用于这些核心集以保持近似最优解。挑战在于设计具有可证明的大小和近似误差之间权衡的核心集。本次调查以回顾性方式总结了此类结构,旨在统一和简化最先进的技术。然后可以将传统算法应用于这些核心集以保持近似最优解。挑战在于设计具有可证明的大小和近似误差之间权衡的核心集。本次调查以回顾性方式总结了此类结构,旨在统一和简化最先进的技术。然后可以将传统算法应用于这些核心集以保持近似最优解。挑战在于设计具有可证明的大小和近似误差之间权衡的核心集。本次调查以回顾性方式总结了此类结构,旨在统一和简化最先进的技术。
更新日期:2020-11-19
down
wechat
bug