Scalable Feature Matching Across Large Data Collections,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable Feature Matching Across Large Data Collections
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-06-02 , DOI: 10.1080/10618600.2022.2074429
David Degras ₁

Affiliation

Abstract

This article is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop fast algorithms with time complexity roughly linear in the number n of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce $(\begin{matrix} n \\ 2 \end{matrix})$ matching problems between pairs of datasets to n problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization performances. An application of feature matching to a large neuroimaging database is presented. The algorithms of this article are implemented in the R package matchFeat available at github.com/ddegras/matchFeat. Supplementary materials for this article are available online.

中文翻译：

跨大型数据集合的可扩展特征匹配

摘要

本文关注的是在大型数据集集合中以一对一的方式匹配特征向量。将此任务表述为具有可分解成本 (MDADC) 的多维分配问题，我们开发了时间复杂度与数据集数量n大致呈线性关系且空间复杂度仅为数据大小的一小部分的快速算法。这些显着的特性取决于使用平方欧几里得距离作为相异函数，这可以减少 $(\begin{matrix} n \\ 2个 \end{matrix})$ 将数据集对之间的问题与n 个问题进行匹配，并能够即时计算分配成本。据我们所知，没有其他适用于 MDADC 的方法具有大规模应用所需的这些线性缩放和低存储特性。在数值实验中，新算法优于竞争方法，并显示出出色的计算和优化性能。介绍了特征匹配在大型神经影像学数据库中的应用。本文的算法在github.com/ddegras/matchFeat上的 R 包 matchFeat 中实现。本文的补充材料可在线获取。

更新日期：2022-06-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11