当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enabling fast and energy-efficient FM-index exact matching using processing-near-memory
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2021-03-02 , DOI: 10.1007/s11227-021-03661-3
Jose M. Herruzo , Ivan Fernandez , Sonia González-Navarro , Oscar Plata

Memory bandwidth and latency constitutes a major performance bottleneck for many data-intensive applications. While high-locality access patterns take advantage of the deep cache hierarchies available in modern processors, unpredictable low-locality patterns cause a significant part of the execution time to be wasted waiting for data. An example of those memory bound applications is the exact matching algorithm based on FM-index, used in some well-known sequence alignment applications. Processing-Near-Memory (PNM) has been proposed as a strategy to overcome the memory wall problem, by placing computation close to data, speeding up memory bound workloads by reducing data movements. This paper presents a performance and energy evaluation of two classes of processor architectures when executing the FM-index exact matching algorithm, as a reference algorithm for exact sequence alignment. One architecture class is processor-centric, based on complex cores and DDR3/4 SDRAM memory technology. The other architecture class is memory-centric, based on simple cores and ultra-high-bandwidth hybrid memory cube (HMC) 3D-stacked memory technologies. The results show that the PNM solution improves performance between 1.26\(\times\) and 3.7\(\times\) and the energy consumption per operation is reduced between 21\(\times\) and 40\(\times\). In addition, a synthetic benchmark RANDOM was developed that mimics the memory access pattern of the FM-index exact matching algorithm, but with a user configurable operational intensity. This benchmark allows us to extend the evaluation to the class of algorithms with similar memory behaviour but running over a range of operational intensity values.



中文翻译:

使用临近存储器实现快速,节能的FM指数精确匹配

内存带宽和延迟是许多数据密集型应用程序的主要性能瓶颈。尽管高位置访问模式利用了现代处理器中可用的深层缓存层次结构,但不可预测的低位置模式却导致执行时间的大部分浪费在等待数据上。这些内存绑定应用程序的一个示例是在某些众所周知的序列比对应用程序中使用的基于FM索引的精确匹配算法。已提出将处理近内存(PNM)作为克服内存壁的策略问题是,通过将计算放在靠近数据的位置,通过减少数据移动来加快内存绑定的工作负载。本文介绍了执行FM-index精确匹配算法时,两类处理器体系结构的性能和能量评估,以此作为精确序列比对的参考算法。一类架构是以处理器为中心的,基于复杂的内核和DDR3 / 4 SDRAM内存技术。另一类架构是以内存为中心的,基于简单内核和超高带宽混合内存多维数据集(HMC)3D堆栈内存技术。结果表明,PNM解决方案将性能提高了1.26 \(\ times \)和3.7 \(\ times \)之间,并将每次操作的能耗降低了21 \(\ times \)和40之间。\(\ times \)。此外,还开发了一种综合基准RANDOM,它模仿FM索引精确匹配算法的内存访问模式,但具有用户可配置的操作强度。此基准使我们可以将评估范围扩展到具有类似内存行为但在一定范围的操作强度值上运行的算法类别。

更新日期:2021-03-02
down
wechat
bug