当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Complementary Binary Quantization for Joint Hash Table Learning
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2020-02-14 , DOI: 10.1109/tnnls.2020.2965992
Xianglong Liu , Qiang Fu , Deqing Wang , Xiao Bai , Xinyu Wu , Dacheng Tao

Building multiple hash tables serves as a very successful technique for gigantic data indexing, which can simultaneously guarantee both the search accuracy and efficiency. However, most of existing multitable indexing solutions, without informative hash codes and strong table complementarity, largely suffer from the table redundancy. To address the problem, we propose a complementary binary quantization (CBQ) method for jointly learning multiple tables and the corresponding informative hash functions in a centralized way. Based on CBQ, we further design a distributed learning algorithm (D-CBQ) to accelerate the training over the large-scale distributed data set. The proposed (D-)CBQ exploits the power of prototype-based incomplete binary coding to well align the data distributions in the original space and the Hamming space and further utilizes the nature of multi-index search to jointly reduce the quantization loss. (D-)CBQ possesses several attractive properties, including the extensibility for generating long hash codes in the product space and the scalability with linear training time. Extensive experiments on two popular large-scale tasks, including the Euclidean and semantic nearest neighbor search, demonstrate that the proposed (D-)CBQ enjoys efficient computation, informative binary quantization, and strong table complementarity, which together help significantly outperform the state of the arts, with up to 57.76% performance gains relatively.

中文翻译:


用于联合哈希表学习的分布式互补二进制量化



构建多个哈希表是一种非常成功的海量数据索引技术,可以同时保证搜索的准确性和效率。然而,大多数现有的多表索引解决方案缺乏信息丰富的哈希码和强表互补性,很大程度上受到表冗余的影响。为了解决这个问题,我们提出了一种互补二进制量化(CBQ)方法,用于以集中方式联合学习多个表和相应的信息哈希函数。基于CBQ,我们进一步设计了分布式学习算法(D-CBQ)来加速大规模分布式数据集的训练。所提出的(D-)CBQ利用基于原型的不完全二进制编码的能力来很好地对齐原始空间和汉明空间中的数据分布,并进一步利用多索引搜索的性质来共同减少量化损失。 (D-)CBQ 具有几个有吸引力的特性,包括在乘积空间中生成长哈希码的可扩展性以及线性训练时间的可扩展性。对两个流行的大规模任务(包括欧几里德和语义最近邻搜索)进行的广泛实验表明,所提出的(D-)CBQ具有高效的计算、信息丰富的二进制量化和强大的表互补性,这些共同有助于显着优于艺术方面,相对而言绩效提升高达57.76%。
更新日期:2020-02-14
down
wechat
bug