当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BlueDBM
ACM Transactions on Computer Systems ( IF 2.0 ) Pub Date : 2016-09-17 , DOI: 10.1145/2898996
Sang-Woo Jun 1 , Ming Liu 1 , Sungjin Lee 1 , Jamey Hicks 2 , John Ankcorn 2 , Myron King 2 , Shuotao Xu 1 , Arvind 1
Affiliation  

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data, and daily Twitter feeds, where the datasets of interest are 5TB to 20TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GB of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. However, currently available off-the-shelf flash storage packaged as SSDs does not make effective use of flash storage because it incurs a great amount of additional overhead during flash device management and network access. In this article, we present BlueDBM, a new system architecture that has flash-based storage with in-store processing capability and a low-latency high-throughput intercontroller network between storage devices. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a DRAM-centric system falls sharply even if only 5p to 10p of the references are to secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost/performance tradeoff for Big Data analytics.

中文翻译:

蓝色数据库

由于需要随机访问,复杂的数据查询已被证明是缓慢的,除非所有数据都可以容纳在 DRAM 中。有许多领域,例如基因组学、地质数据和每日 Twitter 提要,其中感兴趣的数据集为 5TB 到 20TB。对于这样的数据集,需要一个包含 100 个服务器的集群,每个服务器具有 128GB 到 256GB 的 DRAM,以容纳 DRAM 中的所有数据。另一方面,此类数据集可以轻松存储在机架大小的集群的闪存中。闪存存储具有比硬盘更好的随机访问性能,这使其成为分析工作负载的理想选择。然而,目前可用的封装为SSD的现成闪存并不能有效利用闪存,因为它在闪存设备管理和网络访问过程中会产生大量额外开销。在本文中,我们介绍了 BlueDBM,这是一种新的系统架构,它具有基于闪存的存储,具有店内处理能力和存储设备之间的低延迟高吞吐量互控制器网络。我们表明,对于某些重要应用程序,BlueDBM 的性能比没有这些功能的基于闪存的系统高 10 倍。尽管即使只有 5p 到 10p 的引用指向二级存储,以 DRAM 为中心的系统的性能也会急剧下降,但这种急剧的性能下降在 BlueDBM 中不是问题。BlueDBM 在大数据分析的成本/性能权衡中提供了一个有吸引力的点。我们表明,对于某些重要应用程序,BlueDBM 的性能比没有这些功能的基于闪存的系统高 10 倍。尽管即使只有 5p 到 10p 的引用指向二级存储,以 DRAM 为中心的系统的性能也会急剧下降,但这种急剧的性能下降在 BlueDBM 中不是问题。BlueDBM 在大数据分析的成本/性能权衡中提供了一个有吸引力的点。我们表明,对于某些重要应用程序,BlueDBM 的性能比没有这些功能的基于闪存的系统高 10 倍。尽管即使只有 5p 到 10p 的引用指向二级存储,以 DRAM 为中心的系统的性能也会急剧下降,但这种急剧的性能下降在 BlueDBM 中不是问题。BlueDBM 在大数据分析的成本/性能权衡中提供了一个有吸引力的点。
更新日期:2016-09-17
down
wechat
bug