当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating Multicore Algorithms on the Unified Memory Model
Scientific Programming ( IF 1.672 ) Pub Date : 2009 , DOI: 10.3233/spr-2009-0290
John E. Savage, Mohammad Zubair

One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5.

中文翻译:

在统一内存模型上评估多核算法

在多核体系结构上实现良好性能的挑战之一是如何有效利用基础内存层次结构。虽然这对于单核体系结构是一个问题,但对于多核芯片却是一个关键问题。在本文中,我们制定了统一的多核模型(UMM),以帮助理解这些体系结构上的缓存性能的基本限制。UMM无缝地处理不同类型的多核处理器,并在不同级别上具有不同程度的缓存共享。我们证明了我们的模型可用于研究各种应用程序上的各种多核体系结构。特别是,我们使用三项式模型使用它来分析期权定价问题,并为其开发一种算法,该算法在缓存级别之间具有接近最佳的内存流量。我们已经在两个四核Intel Xeon 5310 1.6 GHz处理器(8个内核)上实现了该算法。它达到了19.5 GFLOP的峰值性能,是多核系统理论峰值的38%。我们证明了我们的算法比编译器优化和自动并行化代码的性能高出7.5倍。
更新日期:2020-09-25
down
wechat
bug