当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bio-Inspired Hashing for Unsupervised Similarity Search
arXiv - CS - Databases Pub Date : 2020-01-14 , DOI: arxiv-2001.04907
Chaitanya K. Ryali, John J. Hopfield, Leopold Grinberg, Dmitry Krotov

The fruit fly Drosophila's olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.

中文翻译:

用于无监督相似性搜索的仿生哈希

果蝇果蝇的嗅觉回路启发了一种新的局部敏感哈希 (LSH) 算法 FlyHash。与产生低维散列码的经典 LSH 算法相比,FlyHash 产生稀疏的高维散列码,并且在相似性搜索中与经典 LSH 算法相比也显示出优越的经验性能。但是,FlyHash 使用随机投影,无法从数据中学习。基于 FlyHash 的灵感和神经生物学中普遍存在的稀疏扩展表示,我们的工作提出了一种新颖的哈希算法 BioHash,该算法以数据驱动的方式生成稀疏的高维哈希码。我们表明 BioHash 优于先前发布的各种散列方法的基准。由于我们的学习算法基于局部和生物学上似是而非的突触可塑性规则,我们的工作为以下提议提供了证据,即 LSH 可能是各种生物系统中大量稀疏膨胀图案的计算原因。我们还提出了一个卷积变体 BioConvHash,进一步提高了性能。从计算机科学的角度来看,BioHash 和 BioConvHash 是快速、可扩展的,并且产生对相似性搜索有用的压缩二进制表示。
更新日期:2020-10-09
down
wechat
bug