当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coherence Traffic in Manycore Processors with Opaque Distributed Directories
arXiv - CS - Hardware Architecture Pub Date : 2020-11-10 , DOI: arxiv-2011.05422
Steve Kommrusch, Marcos Horro, Louis-No\"el Pouchet, Gabriel Rodr\'iguez, Juan Touri\~no

Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories. A paramount example is the Intel Mesh interconnect, which consists of a network-on-chip interconnecting "tiles", each of which contains computation cores, local caches, and coherence masters. The distributed coherence subsystem must be queried for every out-of-tile access, imposing an overhead on memory latency. This paper studies the physical layout of an Intel Knights Landing processor, with a particular focus on the coherence subsystem, and uncovers the pseudo-random mapping function of physical memory blocks across the pieces of the distributed directory. Leveraging this knowledge, candidate optimizations to improve memory latency through the minimization of coherence traffic are studied. Although these optimizations do improve memory throughput, ultimately this does not translate into performance gains due to inherent overheads stemming from the computational complexity of the mapping functions.

中文翻译:

具有不透明分布式目录的多核处理器中的一致性流量

众核处理器具有大量旨在以多线程方式工作的通用内核。最近的多核处理器使用可扩展的分布式目录保持一致。一个最重要的例子是英特尔 Mesh 互连,它由片上网络互连“瓦片”组成,每个瓦片都包含计算核心、本地缓存和一致性主控。分布式一致性子系统必须为每一次 tile 外访问查询,从而对内存延迟造成开销。本文研究了 Intel Knights Landing 处理器的物理布局,特别关注一致性子系统,并揭示了跨分布式目录的物理内存块的伪随机映射功能。利用这些知识,研究了通过最小化一致性流量来改善内存延迟的候选优化。尽管这些优化确实提高了内存吞吐量,但由于映射函数的计算复杂性产生的固有开销,最终这并没有转化为性能提升。
更新日期:2020-11-12
down
wechat
bug