当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analytical Performance Models for NoCs with Multiple Priority Traffic Classes
arXiv - CS - Performance Pub Date : 2019-08-07 , DOI: arxiv-1908.02408
Sumit K. Mandal, Raid Ayoub, Michael Kishinevsky, Umit Y. Ogras

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach consists of developing two novel transformations of queuing system and designing an algorithm which iteratively uses these two transformations to estimate end-to-end latency. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5x speedup in full-system simulation.

中文翻译:

具有多个优先级流量类别的 NoC 的分析性能模型

片上网络 (NoC) 已成为工业设计中互连解决方案的标准,从客户端 CPU 到多核芯片多处理器。由于 NoC 在系统性能和功耗方面起着至关重要的作用,因此硅前评估环境包括周期精确的 NoC 模拟器。长时间的模拟会增加评估框架的执行时间,这些框架已经出了名的慢,并禁止设计空间探索。假设公平仲裁的现有分析 NoC 模型无法替代这些模拟,因为工业 NoC 通常采用优先级调度程序和多个优先级类别。为了解决这个限制,我们提出了一种使用微架构规范和输入流量来构建优先级感知分析性能模型的系统方法。我们的方法包括开发排队系统的两种新颖变换和设计一种算法,该算法迭代地使用这两种变换来估计端到端延迟。我们的方法将给定的 NoC 分解为具有修改服务时间的单个队列,以实现准确且可扩展的延迟计算。具体来说,我们引入了新的转换以及迭代地应用这些转换来分解排队系统的算法。使用真实架构和应用程序的实验评估表明,在全系统仿真中,准确度高达 97%,加速高达 2.5 倍。我们的方法将给定的 NoC 分解为具有修改服务时间的单个队列,以实现准确且可扩展的延迟计算。具体来说,我们引入了新的转换以及迭代地应用这些转换来分解排队系统的算法。使用真实架构和应用程序的实验评估表明,在全系统仿真中,准确度高达 97%,加速高达 2.5 倍。我们的方法将给定的 NoC 分解为具有修改服务时间的单个队列,以实现准确且可扩展的延迟计算。具体来说,我们引入了新的转换以及迭代地应用这些转换来分解排队系统的算法。使用真实架构和应用程序的实验评估表明,在全系统仿真中,准确度高达 97%,加速高达 2.5 倍。
更新日期:2020-01-07
down
wechat
bug