当前位置: X-MOL 学术PeerJ Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Addressing multiple bit/symbol errors in DRAM subsystem
PeerJ Computer Science ( IF 3.5 ) Pub Date : 2021-02-09 , DOI: 10.7717/peerj-cs.359
Ravikiran Yeleswarapu , Arun K. Somani

As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller.

中文翻译:

解决DRAM子系统中的多个位/符号错误

随着DRAM技术不断朝着更小的特征尺寸和更高的密度发展,DRAM子系统中的故障越来越严重。当前的服务器大多使用基于CHIPKILL的方案来容忍每个DRAM节拍高达一/两个符号错误。这样的方案可能不会检测到由于多个设备和/或数据总线,地址总线中的故障而引起的多个符号错误。在本文中,我们介绍了单符号校正多符号检测(SSCMSD)—一种新颖的错误处理方案,用于纠正单符号错误和检测多符号错误。我们的方案结合使用散列和纠错码(ECC)来避免静默数据损坏(SDC)。我们开发了一种新颖的方案,该方案将32位CRC与Reed-Solomon代码一起部署,以针对基于×4的DDR4系统实现SSCMSD。基于仿真的实验表明,我们的方案有效地防止了仅受哈希混叠概率限制的设备,数据总线和地址总线错误。我们新颖的设计使我们能够在不引入额外的READ延迟的情况下实现这一目标。每个等级我们需要19个芯片,76条数据总线和内存控制器上的其他哈希逻辑。
更新日期:2021-02-09
down
wechat
bug