当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Metric hull as similarity-aware operator for representing unstructured data
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-06-12 , DOI: 10.1016/j.patrec.2021.05.011
Matej Antol , Miriama Jánošová , Vlastislav Dohnal

Similarity searching has become widely utilized in many online services processing unstructured and complex data, e.g., Google Images. Metric spaces are often applied to model and organize such data by their mutual similarity. As top-k queries provide only a local view on data, a data analyst must pose multiple requests to observe the entire dataset. Thus, group-by operators for metric data have been proposed. These operators identify groups by respecting a given similarity constraint and produce a set of objects per group. The analyst can then tediously browse these sets directly, but representative members may provide better insight. In this paper, we focus on concise representations of metric datasets. We propose a novel concept of a metric hull which encompasses a given set by selecting a few objects. Testing an object to be part of the set is then made much faster. We verify this concept on synthetic Euclidean data and real-life image and text datasets and show its effectiveness and scalability. The metric hulls provide much faster and more compact representations when compared with commonly used ball representations.



中文翻译:

度量外壳作为用于表示非结构化数据的相似性感知运算符

相似性搜索已广泛用于处理非结构化和复杂数据的许多在线服务,例如 Google 图片。度量空间通常用于通过它们的相互相似性对此类数据进行建模和组织。由于 top-k 查询仅提供数据的局部视图,因此数据分析师必须提出多个请求来观察整个数据集。因此,已经提出了用于度量数据的分组运算符。这些运算符通过尊重给定的相似性约束来识别组,并为每个组生成一组对象。分析师随后可以繁琐地直接浏览这些集合,但代表成员可能会提供更好的洞察力。在本文中,我们专注于度量数据集的简明表示。我们提出了一个度量外壳的新概念,它通过选择一些对象来包含给定的集合。测试一个对象是否是集合的一部分会更快。我们在合成的欧几里得数据和现实生活中的图像和文本数据集上验证了这个概念,并展示了它的有效性和可扩展性。与常用的球表示相比,公制外壳提供了更快、更紧凑的表示。

更新日期:2021-06-30
down
wechat
bug