当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online multimedia retrieval on CPU–GPU platforms with adaptive work partition
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-10-14 , DOI: 10.1016/j.jpdc.2020.10.001
Rafael Souza , André Fernandes , Thiago S.F.X. Teixeira , George Teodoro , Renato Ferreira

Nearest neighbors search is a core operation found in several online multimedia services. These services have to handle very large databases, while, at the same time, they must minimize the query response times observed by users. This is specially complex because those services deal with fluctuating query workloads (rates). Consequently, they must adapt at run-time to minimize the response times as the load varies. In this paper, we address the aforementioned challenges with a distributed memory parallelization of the product quantization nearest neighbor search, also known as IVFADC, for hybrid CPU–GPU machines. Our parallel IVFADC implements an out-of-GPU memory execution scheme to use the GPU for databases in which the index does not fit in its memory, which is crucial for searching in very large databases. The careful use of CPU and GPU with work stealing led to an average response time reduction of 2.4× as compared to using the GPU only. Also, our approach to adapt the system to fluctuating loads, called Dynamic Query Processing Policy (DQPP), attained a response time reduction of up to 5× vs. the best static (BS) policy for moderate loads. The system has attained high query processing rates and near-linear scalability in all experiments. We have evaluated our system on a machine with up to 256 NVIDIA V100 GPUs processing a database of 256 billion SIFT features vectors.



中文翻译:

具有自适应工作分区的CPU–GPU平台上的在线多媒体检索

最近的邻居搜索是一些在线多媒体服务中的一项核心操作。这些服务必须处理非常大的数据库,而与此同时,它们必须最小化用户观察到的查询响应时间。这特别复杂,因为这些服务处理波动的查询工作负载(费率)。因此,它们必须在运行时进行调整,以使负载变化时的响应时间最小化。在本文中,我们通过针对混合CPU-GPU机器的乘积量化最近邻搜索(也称为IVFADC)的分布式内存并行化来解决上述挑战。我们的并行IVFADC实现了GPU内存外执行方案,以将GPU用于索引不适合其内存的数据库,这对于在大型数据库中进行搜索至关重要。×与仅使用GPU相比。此外,我们将系统适应波动负载的方法称为动态查询处理策略(DQPP),可将响应时间减少多达5×相对于中等负载的最佳静态(BS)策略。该系统在所有实验中均获得了很高的查询处理率和近乎线性的可扩展性。我们已经在具有多达256个NVIDIA V100 GPU的计算机上评估了我们的系统,该GPU处理了2560亿个SIFT特征向量的数据库。

更新日期:2020-10-30
down
wechat
bug