当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hollow-tree: a metric access method for data with missing values
Journal of Intelligent Information Systems ( IF 2.3 ) Pub Date : 2019-07-09 , DOI: 10.1007/s10844-019-00567-8
Safia Brinis , Caetano Traina , Agma J. M. Traina

Similarity search is fundamental to store and retrieve large volumes of complex data required by many real world applications. A useful mechanism for such concept is the query-by-similarity. Based on their topological properties, metric similarity functions can be used to index sets of data which can be queried effectively and efficiently by the so-called metric access methods. However, data produced by various application domains and the varying data types handled often lead to missing data, hence, they do not follow the metric similarity requirements. As a consequence, missing data cause distortions in the index structure and yield bias in the query answer. In this paper, we propose the Hollow-tree, a novel access method aimed at successfully retrieving data with missing attribute values. It employs new strategies for indexing and searching data elements, capable of handling the missing data issues when the cause of missingness is ignorable. The indexing strategy is based on a family of distance functions that allow measuring the distance between elements with missing values, along with a set of policies able to organize the elements in the index without causing distortions to its internal structure. The searching strategy employs fractal dimension property of the data to achieve accurate query answer while considering data with missing values part of the response. Results from experiments performed on a variety of real and synthetic data sets showed that, while other metric access methods deteriorate with small amounts of missing values, the Hollow-tree maintains a remarkable performance with almost 100% of precision and recall for range queries and more than 90% for k-nearest neighbor queries, for up to 40% of missing values.

中文翻译:

Hollow-tree:一种缺失值数据的度量访问方法

相似性搜索是存储和检索许多现实世界应用程序所需的大量复杂数据的基础。这种概念的一个有用机制是按相似性查询。基于它们的拓扑特性,度量相似度函数可用于索引数据集,这些数据集可以通过所谓的度量访问方法进行有效和高效的查询。然而,各种应用领域产生的数据和处理的不同数据类型经常导致数据丢失,因此,它们不遵循度量相似性要求。因此,缺失的数据会导致索引结构的扭曲并在查询答案中产生偏差。在本文中,我们提出了空心树,这是一种新颖的访问方法,旨在成功检索具有缺失属性值的数据。它采用新的索引和搜索数据元素策略,能够在丢失原因可忽略的情况下处理丢失数据问题。索引策略基于一系列距离函数,这些函数允许测量具有缺失值的元素之间的距离,以及一组能够组织索引中的元素而不会对其内部结构造成扭曲的策略。搜索策略利用数据的分形维数属性来获得准确的查询答案,同时考虑响应中缺少值的数据。对各种真实和合成数据集进行的实验结果表明,虽然其他度量访问方法会因少量缺失值而恶化,
更新日期:2019-07-09
down
wechat
bug