当前位置: X-MOL 学术ACM Trans. Model. Comput. Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ChunkedTejas
ACM Transactions on Modeling and Computer Simulation ( IF 0.7 ) Pub Date : 2020-06-01 , DOI: 10.1145/3375397
Rajshekar Kalayappan 1 , Avantika Chhabra 2 , Smruti R. Sarangi 2
Affiliation  

Research in computer architecture is commonly done using software simulators. The simulation speed of such simulators is therefore critical to the rate of progress in research. One of the less commonly used ways to increase the simulation speed is to decompose the benchmark’s execution into contiguous chunks of instructions and simulate these chunks in parallel. Two issues arise from this approach. The first is of correctness, as each chunk (other than the first chunk) starts from an incorrect state. The second is of performance: The decomposition must be done in such a way that the simulation of all chunks finishes at nearly the same time, allowing for maximum speedup. In this article, we study these two aspects and compare three different chunking approaches (two of them are novel) and two warmup approaches (one of them is novel). We demonstrate that average speedups of up to 5.39X can be achieved (while employing eight parallel instances), while constraining the error to 0.2% on average.

中文翻译:

块状光辉

计算机体系结构的研究通常使用软件模拟器来完成。因此,此类模拟器的模拟速度对于研究的进展速度至关重要。一种不太常用的提高模拟速度的方法是将基准测试的执行分解为连续的指令块并并行模拟这些块。这种方法产生了两个问题。第一个是正确的,因为每个块(除了第一个块)都从不正确的状态开始。第二个是性能:分解必须以几乎同时完成所有块的模拟的方式完成,从而实现最大的加速。在本文中,我们研究了这两个方面,并比较了三种不同的分块方法(其中两种是新颖的)和两种预热方法(其中一种是新颖的)。
更新日期:2020-06-01
down
wechat
bug