当前期刊: IEEE Computer Architecture Letters Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • MCsim: An Extensible DRAM Memory Controller Simulator
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-09
    Reza Mirosanlou; Danlu Guo; Mohamed Hassan; Rodolfo Pellizzoni

    Numerous proposals for memory controller (MC) designs have been exposed to the research community. Interest has since been growing in the area of computer architecture and real-time systems to improve the throughput of the system and/or guarantee timing requirements through novel scheduling algorithms. Consequently, comprehensive simulators are highly demanded since they provide an infrastructure for

  • FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-29
    Jie Zhang; Miryeong Kwon; Sanghyun Han; Nam Sung Kim; Mahmut Kandemir; Myoungsoo Jung

    Host-side page victimizations can easily overflow the SSD internal buffer, which interferes I/O services of diverse user applications thereby degrading user-level experiences. To address this, we propose FastDrain, a co-design of OS kernel and flash firmware to avoid the buffer overflow, caused by page victimizations. Specifically, FastDrain can detect a triggering point where a near-future page victimization

  • The Entangling Instruction Prefetcher
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-16
    Alberto Ros; Alexandra Jimborean

    Prefetching instructions is a fundamental technique for designing high-performance computers. There are three key properties to consider when designing an efficient and effective prefetcher: timeliness, coverage, and accuracy. Timeliness is an essential property, as bringing instructions too early increases the risk of the instructions being evicted from the cache before their use while requesting

  • Value Locality Based Approximation With ODIN
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-15
    Rahul Singh; Gokul Subramanian Ravi; Mikko Lipasti; Joshua San Miguel

    Applications suited to approximation often exhibit significant value locality, both in terms of inputs as well as outcomes. In this early stage proposal - the ODIN: Outcome Driven Input Navigated approach to value locality based approximation, we hypothesize that value locality based optimizations for approximate applications should be driven by outcomes i.e., the result of the computation, but navigated

  • Probability-Based Address Translationfor Flash SSDs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-02
    Junsu Im; Hanbyeol Kim; Yumin Won; Jiho Oh; Minjae Kim; Sungjin Lee

    Thanks to the advance of NAND scaling technologies, an ultra-scale SSD (e.g., $>$ 100 TB) is introduced to markets. This rapid increase of SSD capacity, however, comes at the cost of more DRAM which resides in an SSD controller for logical-to-physical (L2P) address translation. Many have proposed various address translation algorithms to reduce DRAM, but they fail to provide short read latency, in

  • The Case for Domain-Specialized Branch Predictors for Graph-Processing
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-30
    Ahmed Samara; James Tuck

    Branch prediction is believed by many to be a solved problem, with state-of-the-art predictors achieving near-perfect prediction for many programs. In this article, we conduct a detailed simulation of graph-processing workloads in the GAPBS benchmark suite and show that branch mispredictions occur frequently and are still a large limitation on performance in key graph-processing applications. We provide

  • A Two-Directional BigData Sorting Architecture on FPGAs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-07
    Bo-Cheng Lai; Chun-Yen Chen; Yi-Da Hsin; Bo-Yen Lin

    Sorting is pivotal data analytics and becomes challenging with intensive computation on drastically growing data volume. Sorting on FPGA has shown superior throughput, but the limited in-system memory causes vast data transferring to/from external storage when handling a large dataset. We propose a two-directional sorting (2DSort) architecture which sorts data sequences on both horizontal and vertical

  • NMTSim: Transaction-Command Based Simulator for New Memory Technology Devices
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-18
    Peng Gu; Benjamin S. Lim; Wenqin Huangfu; Krishan T. Malladi; Andrew Chang; Yuan Xie

    To mitigate the impact of non-deterministic media access latencies in new memory technology devices, a recently proposed Non-Volatile Dual In-line Memory Module (NVDIMM) standard, NVDIMM-P uses novel out-of-order transaction commands. The previous DRAM simulators are unable to support this transaction protocol due to deterministic DDR timing. Also, existing NVDIMM simulators are customized for NAND

  • HiLITE: Hierarchical and Lightweight Imitation Learning for Power Management of Embedded SoCs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-13
    Anderson L. Sartor; Anish Krishnakumar; Samet E. Arda; Umit Y. Ogras; Radu Marculescu

    Modern systems-on-chip (SoCs) use dynamic power management (DPM) techniques to improve energy efficiency. However, existing techniques are unable to efficiently adapt the runtime decisions considering multiple objectives (e.g., energy and real-time requirements) simultaneously on heterogeneous platforms. To address this need, we propose HiLITE, a hierarchical imitation learning framework that maximizes

  • Heterogeneous 3D Integration for a RISC-V System With STT-MRAM
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-06
    Lingjun Zhu; Lennart Bamberg; Anthony Agnesina; Francky Catthoor; Dragomir Milojevic; Manu Komalan; Julien Ryckaert; Alberto Garcia-Ortiz; Sung Kyu Lim

    Spin Torque Transfer Magnetic RAM (STT-MRAM) is a promising Non-Volatile Memory (NVM) technology achieving high density, low leakage power, and relatively small read/write delays. It provides a solution to improve the performance and to mitigate the leakage power consumption compared to SRAM-based processors. However, the process heterogeneity and the sophisticated back-end-of-line (BEOL) structure

  • NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-27
    Seyyed Hossein SeyyedAghaei Rezaei; Mehdi Modarressi; Rachata Ausavarungnirun; Mohammad Sadrosadati; Onur Mutlu; Masoud Daneshtalab

    Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM chip and the processor chip. Some prior mechanisms propose to avoid this unnecessary data movement by using the shared internal bus in the DRAM chip to directly

  • A Power-Aware Heterogeneous Architecture Scaling Model for Energy-Harvesting Computers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-24
    Harsh Desai; Brandon Lucia

    Energy-harvesting devices are the key to enabling future ubiquitous sensing applications, because they are long lived and require little maintenance. On-device processing of sensed data, such as images, avoids the high energy cost of communicating data to the edge or cloud. This letter observes that the on-device computing performance of an energy-harvesting system depends not only on execution time

  • Architectural Implications of Graph Neural Networks
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-21
    Zhihui Zhang; Jingwen Leng; Lingxiao Ma; Youshan Miao; Chao Li; Minyi Guo

    Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures. It is becoming more and more popular due to its high accuracy achieved in many graph-related tasks. However, GNN is not as well understood in the system and architecture community as its counterparts such as multi-layer perceptrons and convolutional neural networks. This letter tries to

  • Unexpected Performance of Intel® Optane™ DC Persistent Memory
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-20
    Tony Mason; Thaleia Dimitra Doudali; Margo Seltzer; Ada Gavrilovska

    We evaluated Intel® Optane TM DC Persistent Memory and found that Intel's persistent memory is highly sensitive to data locality, size, and access patterns, which becomes clearer by optimizing both virtual memory page size and data layout for locality. Using the Polybench high-performance computing benchmark suite and controlling for mapped page size, we evaluate persistent meemory (PMEM) performance

  • Network Packet Processing Mode-Aware Power Management for Data Center Servers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-07-01
    Ki-Dong Kang; Gyeongseo Park; Nam Sung Kim; Daehoon Kim

    In data center servers, power management (PM) exploiting Dynamic Voltage and Frequency Scaling (DVFS) for processors can play a crucial role to improve energy efficiency. However, we observe that current PM policies (i.e., governors) not only considerably increase tail response time (i.e., violate a given Service Level Objective (SLO)) but also hurt energy efficiency. Tackling limitations of current

  • Brutus: Refuting the Security Claims of the Cache Timing Randomization Countermeasure Proposed in CEASER
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-01-06
    Rahul Bodduna; Vinod Ganesan; Patanjali SLPSK; Kamakoti Veezhinathan; Chester Rebeiro

    Cache timing attacks are a serious threat to the security of computing systems. It permits sensitive information, such as cryptographic keys, to leak across virtual machines and even to remote servers. Encrypted Address Cache, proposed by CEASER - a best paper candidate at MICRO 2018 - is a promising countermeasure that stymies the timing channel by employing cryptography to randomize the cache address

  • Towards Scalable Analytics with Inference-Enabled Solid-State Drives
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-07-23
    Minsub Kim; Jaeha Kung; Sungjin Lee

    In this paper, we propose a novel storage architecture, called an Inference-Enabled SSD (IESSD), which employs FPGA-based DNN inference accelerators inside an SSD. IESSD is capable of performing DNN operations inside an SSD, avoiding frequent data movements between application servers and data storage. This boosts up analytics performance of DNN applications. Moreover, by placing accelerators near

  • Challenges in Detecting an “Evasive Spectre”
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-02-24
    Congmiao Li; Jean-Luc Gaudiot

    Spectre attacks exploit serious vulnerabilities in modern CPU design to extract sensitive data through side channels. Completely fixing the problem would require a redesign of the architecture for conditional execution which cannot be backported. Researchers have proposed to detect Spectre with promising accuracy by monitoring deviations in microarchitectural events using existing hardware performance

  • Characterizing and Understanding GCNs on GPU
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-01-30
    Mingyu Yan; Zhaodong Chen; Lei Deng; Xiaochun Ye; Zhimin Zhang; Dongrui Fan; Yuan Xie

    Graph convolutional neural networks (GCNs) have achieved state-of-the-art performance on graph-structured data analysis. Like traditional neural networks, training and inference of GCNs are accelerated with GPUs. Therefore, characterizing and understanding the execution pattern of GCNs on GPU is important for both software and hardware optimization. Unfortunately, to the best of our knowledge, there

  • Post-Silicon Microarchitecture
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-09
    Chanchal Kumar; Aayush Chaudhary; Shubham Bhawalkar; Utkarsh Mathur; Saransh Jain; Adith Vastrad; Eric Rotenberg

    Microprocessors are designed to provide good general performance across a range of benchmarks. As such, microarchitectural techniques which provide good speedup for only a small subset of applications are not attractive when designing a general-purpose core. We propose coupling a reconfigurable fabric with the CPU, on the same chip, via a simple and flexible interface to allow post-silicon development

  • Breaking In-Order Branch Miss Recovery
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-13
    Stijn Eyerman; Wim Heirman; Sam Van den Steen; Ibrahim Hur

    Despite very accurate branch predictors, branch misses remain an important source of performance limiters, especially for irregular applications. To ensure in-order commit, branch miss recovery is done in-order: all instructions after the oldest branch miss are flushed, even if they eventually reconverge with the correct path. We propose a technique to limit flushing to real wrong-path instructions

  • Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-12
    Zhi-Gang Liu; Paul N. Whatmough; Matthew Mattina

    Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to accelerating GEMM, and widely deployed in industry. In this letter, we describe two significant improvements

  • A High-Performance Design of Generalized Pipeline Cellular Array
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-08
    Zhufei Chu; Huiming Tian; Zeqiang Li; Yinshui Xia; Lunyao Wang

    In this letter, we proposed a high-performance quantum-dot cellular automata (QCA) design of generalized pipeline cellular array (GPCA). The GPCA can perform all the basic arithmetic operations using only one arithmetic cell. Due to its flexibility, the high-performance GPCA design is of high interest for large-scale QCA designs. We proposed both the arithmetic unit and control unit designs of GPCA

  • Exploiting Thermal Transients With Deterministic Turbo Clock Frequency
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-30
    Pierre Michaud

    Modern microprocessors feature turbo mechanisms that adjust the clock frequency dynamically so as to maximize processor performance under power and temperature limits. However, the documentation for commercial chips rarely provides more than a superficial description of how turbo works. This letter highlights certains aspects of turbo that are not well known outside the industry and that distinguish

  • The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-16
    Srivatsan Krishnan; Zishen Wan; Kshitij Bhardwaj; Paul Whatmough; Aleksandra Faust; Gu-Yeon Wei; David Brooks; Vijay Janapa Reddi

    We introduce the “Formula-1” (F-1) roofline model to understand the role of computing in aerial autonomous machines. The model provides insights by exploiting the fundamental relationships between various components in an aerial robot, such as sensor framerate, compute performance, and body dynamics (physics). F-1 serves as a tool that can aid computer and cyber-physical system architects to understand

  • Exploring Prefetching, Pre-Execution and Branch Outcome Streaming for In-Memory Database Lookups
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-12-16
    Mustafa Cavus; Mohammed Shatnawi; Resit Sendag; Augustus K. Uht

    Lookup operations for in-memory databases are heavily memory-bound because they often rely on pointer-chasing linked data structure traversals. They are also branch heavy with branches that are hard-to-predict due to random key lookups. In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions, only

Contents have been reproduced by permission of the publishers.