当前期刊: IEEE Computer Architecture Letters Go to current issue    加入关注    本刊投稿指南
显示样式:        排序: IF: - GO 导出
  • Flexion: A Quantitative Metric for Flexibility in DNN Accelerators
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-12-14
    Hyoukjun Kwon; Michael Pellauer; Angshuman Parashar; Tushar Krishna

    Dataflow and tile size choices, which we collectively refer to as mappings, dictate the efficiency (i.e., latency and energy) of DNN accelerators. Rapidly evolving DNN models is one of the major challenges for DNN accelerators since the optimal mapping heavily depends on the layer shape and size. To maintain high efficiency across multiple DNN models, flexible accelerators that can support multiple

  • TRiM: Tensor Reduction in Memory
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-12-07
    Byeongho Kim; Jaehyun Park; Eojin Lee; Minsoo Rhu; Jung Ho Ahn

    Personalized recommendation systems are gaining significant traction due to their industrial importance. An important building block of recommendation systems consists of what is known as the embedding layers, which exhibit a highly memory-intensive characteristics. Fundamental primitives of embedding layers are the embedding vector gathers followed by vector reductions, which exhibit low arithmetic

  • Fine-Grained Scheduling in Heterogeneous-ISA Architectures
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-12-15
    Nirmal Kumar Boran; Shubhankit Rathore; Meet Udeshi; Virendra Singh

    Given the ever increasing demand for improved computational capabilities, heterogeneous-ISA multi-core architectures have emerged as a promising alternative to improve single-threaded performance. Such architectures comprise of multiple cores that differ not just in micro-architectural parameters but also in their Instruction Set Architectures (ISAs). Programs have affinity towards different ISAs during

  • A Day In the Life of a Quantum Error
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-12-17
    Salonik Resch; Swamit Tannu; Ulya R. Karpuzcu; Moinuddin Qureshi

    When a fault occurs in a computational system, it is of interest what effects it has. If it is known what can go wrong, it may also be known how to mitigate or correct for it. While complex, it is possible to obtain this information with rigorous fault injection in classical systems. This is also desirable for quantum systems, unfortunately it is much more difficult. The exponential information content

  • A Lightweight Memory Access Pattern Obfuscation Framework for NVM
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-12-01
    Yuezhi Che; Yuanzhou Yang; Amro Awad; Rujia Wang

    Emerging Non-Volatile Memories (NVMs) are entering the mainstream market. With attractive performance, high density, and near-zero idle power, emerging NVMs are promising contenders to build future memory systems. On the other hand, their limited write endurance ( $10^6$ to $10^8$ write cycles) and enabling data remanence attacks remain as main challenges that could hinder the wide adoption of NVMs

  • Enabling In-SRAM Pattern Processing With Low-Overhead Reporting Architecture
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-12-03
    Elaheh Sadredini; Reza Rahimi; Kevin Skadron

    The demand for accelerated pattern matching has motivated several recent in-memory accelerator architectures for automata processing, which is an efficient computation model for sophisticated pattern matching. Existing in-memory pattern matching architectures focus on accelerating the pattern matching kernel, but either fail to support a practical reporting solution or overlook the reporting stage

  • Rebasing Instruction Prefetching: An Industry Perspective
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-10-30
    Yasuo Ishii; Jaekyu Lee; Krishnendra Nathella; Dam Sunwoo

    Instruction prefetching can play a pivotal role in improving the performance of workloads with large instruction footprints and frequent, costly frontend stalls. In particular, Fetch Directed Prefetching (FDP) is an effective technique to mitigate frontend stalls since it leverages existing branch prediction resources in a processor and incurs very little hardware overhead. Modern processors have been

  • PIM-GraphSCC: PIM-Based Graph Processing Using Graph’s Community Structures
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-11-20
    Newton; Virendra Singh; Trevor E. Carlson

    Graphs are used to store relationships on a variety of topics, such as road map data and social media connections. Processing this data allows one to uncover insights from its structure. However, while analyzing graphs with traditional processors, the graph connectivity can result in irregular memory access patterns and thus poor data locality that can result in low performance. Processing-in-Memory

  • Voltage Noise Mitigation With Barrier Approximation
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-11-24
    Zamshed I. Chowdhury; S. Karen Khatamifard; Zhaoyong Zheng; Tali Moreshet; R. Iris Bahar; Ulya R. Karpuzcu

    Barrier synchronization constructs are placed between phases of parallel programs to ensure correctness in the execution – by preventing threads from proceeding to the subsequent phases of the program before all threads have completed the preceding stage(s). Upon release, threads leaving the barrier at the same time cause sudden change in activity that can potentially lead to voltage emergencies in

  • Aging-Aware Context Switching in Multicore Processors Based on Workload Classification
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-11-24
    Ferdous Sharifi; Nezam Rohbani; Shaahin Hessabi

    As transistor dimensions continue to shrink, long-term reliability threats, such as Negative Bias Temperature Instability, affect multicore processors lifespan. This letter proposes a load balancing technique, based on the rate of integer and floating-point instructions per workloads. This technique classifies workloads into integer-majority and floating-point-majority classes and migrates workloads

  • Adapting In Situ Accelerators for Sparsity with Granular Matrix Reordering
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-10-21
    Darya Mikhailenko; Yujin Nakamoto; Ben Feinberg; Engin Ipek

    Neural network (NN) inference is an essential part of modern systems and is found at the heart of numerous applications ranging from image recognition to natural language processing. In situ NN accelerators can efficiently perform NN inference using resistive crossbars, which makes them a promising solution to the data movement challenges faced by conventional architectures. Although such accelerators

  • GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-09-14
    Ali Jahanshahi; Hadi Zamani Sabzi; Chester Lau; Daniel Wong

    Cloud inference systems have recently emerged as a solution to the ever-increasing integration of AI-powered applications into the smart devices around us. The wide adoption of GPUs in cloud inference systems has made power consumption a first-order constraint in multi-GPU systems. Thus, to achieve this goal, it is critical to have better insight into the power and performance behaviors of multi-GPU

  • Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-28
    Nikita Lazarev; Neil Adit; Shaojie Xiang; Zhiru Zhang; Christina Delimitrou

    Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meet an application's end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-end latency and can introduce performance unpredictability and QoS violations. This letter presents our early work on Dagger , a hardware acceleration

  • A Cross-Stack Approach Towards Defending Against Cryptojacking
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-18
    Nada Lachtar; Abdulrahman Abu Elkhail; Anys Bacha; Hafiz Malik

    Cryptocurrenices are revolutionizing the way we conduct every day business. Unfortunately, cybercriminals have harnessed this technology for making profit through cryptojacking, the act of maliciously appropriating computational resources for mining cryptocurrencies. In this letter, we explore a general solution for detecting cryptojacking attacks irrespective of the application type. We propose an

  • Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-25
    Fatemeh Golshan; Mohammad Bakhshalipour; Mehran Shakerinava; Ali Ansari; Pejman Lotfi-Kamran; Hamid Sarbazi-Azad

    Recent research revisits pairwise-correlating data prefetching due to its extremely low overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where data streams end. As a result, pairwise-correlating data prefetchers either expose low accuracy or they lose timeliness when they are performing multi-degree prefetching. In this letter, we propose a novel technique to detect

  • A Study of Memory Placement on Hardware-Assisted Tiered Memory Systems
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-11
    Wonkyo Choe; Jonghyeon Kim; Jeongseob Ahn

    Recent advances in memory technology, memory hierarchy is becoming diverse with performance-differentiated memory such as high bandwidth memory (HBM) and non-volatile memory (NVM) in modern computer systems. However, the current memory placement has been designed with the assumption that all the memory has the same capabilities based on DRAM. In this letter, we analyze memory placement schemes in state-of-the-art

  • pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-23
    Purab Ranjan Sutradhar; Mark Connolly; Sathwika Bavikadi; Sai Manoj Pudukotai Dinakarrao; Mark A. Indovina; Amlan Ganguly

    Memory access latencies and low data transfer bandwidth limit the processing speed of many data intensive applications such as Convolutional Neural Networks (CNNs) in conventional Von Neumann architectures. Processing in Memory (PIM) is envisioned as a potential hardware solution for such applications as the data access bottlenecks can be avoided in PIM by performing computations within the memory

  • SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-15
    Joo Hwan Lee; Hui Zhang; Veronica Lagrange; Praveen Krishnamoorthy; Xiaodong Zhao; Yang Seok Ki

    Faced with the increasing disparity between SSD throughput and CPU-based compute capabilities, there have been growing interests to move compute closer to storage and accelerate the data analytic workloads. In this letter, we propose SmartSSD, an SSD with onboard FPGA, which enables offloading computation within SSD. We perform a detailed model-based evaluation to evaluate the end-to-end performance

  • MCsim: An Extensible DRAM Memory Controller Simulator
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-09
    Reza Mirosanlou; Danlu Guo; Mohamed Hassan; Rodolfo Pellizzoni

    Numerous proposals for memory controller (MC) designs have been exposed to the research community. Interest has since been growing in the area of computer architecture and real-time systems to improve the throughput of the system and/or guarantee timing requirements through novel scheduling algorithms. Consequently, comprehensive simulators are highly demanded since they provide an infrastructure for

  • FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-29
    Jie Zhang; Miryeong Kwon; Sanghyun Han; Nam Sung Kim; Mahmut Kandemir; Myoungsoo Jung

    Host-side page victimizations can easily overflow the SSD internal buffer, which interferes I/O services of diverse user applications thereby degrading user-level experiences. To address this, we propose FastDrain, a co-design of OS kernel and flash firmware to avoid the buffer overflow, caused by page victimizations. Specifically, FastDrain can detect a triggering point where a near-future page victimization

  • The Entangling Instruction Prefetcher
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-16
    Alberto Ros; Alexandra Jimborean

    Prefetching instructions is a fundamental technique for designing high-performance computers. There are three key properties to consider when designing an efficient and effective prefetcher: timeliness, coverage, and accuracy. Timeliness is an essential property, as bringing instructions too early increases the risk of the instructions being evicted from the cache before their use while requesting

  • Value Locality Based Approximation With ODIN
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-15
    Rahul Singh; Gokul Subramanian Ravi; Mikko Lipasti; Joshua San Miguel

    Applications suited to approximation often exhibit significant value locality, both in terms of inputs as well as outcomes. In this early stage proposal - the ODIN: Outcome Driven Input Navigated approach to value locality based approximation, we hypothesize that value locality based optimizations for approximate applications should be driven by outcomes i.e., the result of the computation, but navigated

  • Probability-Based Address Translationfor Flash SSDs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-02
    Junsu Im; Hanbyeol Kim; Yumin Won; Jiho Oh; Minjae Kim; Sungjin Lee

    Thanks to the advance of NAND scaling technologies, an ultra-scale SSD (e.g., $>$ 100 TB) is introduced to markets. This rapid increase of SSD capacity, however, comes at the cost of more DRAM which resides in an SSD controller for logical-to-physical (L2P) address translation. Many have proposed various address translation algorithms to reduce DRAM, but they fail to provide short read latency, in

  • The Case for Domain-Specialized Branch Predictors for Graph-Processing
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-30
    Ahmed Samara; James Tuck

    Branch prediction is believed by many to be a solved problem, with state-of-the-art predictors achieving near-perfect prediction for many programs. In this article, we conduct a detailed simulation of graph-processing workloads in the GAPBS benchmark suite and show that branch mispredictions occur frequently and are still a large limitation on performance in key graph-processing applications. We provide

  • A Two-Directional BigData Sorting Architecture on FPGAs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-07
    Bo-Cheng Lai; Chun-Yen Chen; Yi-Da Hsin; Bo-Yen Lin

    Sorting is pivotal data analytics and becomes challenging with intensive computation on drastically growing data volume. Sorting on FPGA has shown superior throughput, but the limited in-system memory causes vast data transferring to/from external storage when handling a large dataset. We propose a two-directional sorting (2DSort) architecture which sorts data sequences on both horizontal and vertical

  • NMTSim: Transaction-Command Based Simulator for New Memory Technology Devices
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-18
    Peng Gu; Benjamin S. Lim; Wenqin Huangfu; Krishan T. Malladi; Andrew Chang; Yuan Xie

    To mitigate the impact of non-deterministic media access latencies in new memory technology devices, a recently proposed Non-Volatile Dual In-line Memory Module (NVDIMM) standard, NVDIMM-P uses novel out-of-order transaction commands. The previous DRAM simulators are unable to support this transaction protocol due to deterministic DDR timing. Also, existing NVDIMM simulators are customized for NAND

  • HiLITE: Hierarchical and Lightweight Imitation Learning for Power Management of Embedded SoCs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-13
    Anderson L. Sartor; Anish Krishnakumar; Samet E. Arda; Umit Y. Ogras; Radu Marculescu

    Modern systems-on-chip (SoCs) use dynamic power management (DPM) techniques to improve energy efficiency. However, existing techniques are unable to efficiently adapt the runtime decisions considering multiple objectives (e.g., energy and real-time requirements) simultaneously on heterogeneous platforms. To address this need, we propose HiLITE, a hierarchical imitation learning framework that maximizes

  • Heterogeneous 3D Integration for a RISC-V System With STT-MRAM
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-06
    Lingjun Zhu; Lennart Bamberg; Anthony Agnesina; Francky Catthoor; Dragomir Milojevic; Manu Komalan; Julien Ryckaert; Alberto Garcia-Ortiz; Sung Kyu Lim

    Spin Torque Transfer Magnetic RAM (STT-MRAM) is a promising Non-Volatile Memory (NVM) technology achieving high density, low leakage power, and relatively small read/write delays. It provides a solution to improve the performance and to mitigate the leakage power consumption compared to SRAM-based processors. However, the process heterogeneity and the sophisticated back-end-of-line (BEOL) structure

  • NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-27
    Seyyed Hossein SeyyedAghaei Rezaei; Mehdi Modarressi; Rachata Ausavarungnirun; Mohammad Sadrosadati; Onur Mutlu; Masoud Daneshtalab

    Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM chip and the processor chip. Some prior mechanisms propose to avoid this unnecessary data movement by using the shared internal bus in the DRAM chip to directly

  • A Power-Aware Heterogeneous Architecture Scaling Model for Energy-Harvesting Computers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-24
    Harsh Desai; Brandon Lucia

    Energy-harvesting devices are the key to enabling future ubiquitous sensing applications, because they are long lived and require little maintenance. On-device processing of sensed data, such as images, avoids the high energy cost of communicating data to the edge or cloud. This letter observes that the on-device computing performance of an energy-harvesting system depends not only on execution time

  • Architectural Implications of Graph Neural Networks
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-21
    Zhihui Zhang; Jingwen Leng; Lingxiao Ma; Youshan Miao; Chao Li; Minyi Guo

    Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures. It is becoming more and more popular due to its high accuracy achieved in many graph-related tasks. However, GNN is not as well understood in the system and architecture community as its counterparts such as multi-layer perceptrons and convolutional neural networks. This letter tries to

  • Unexpected Performance of Intel® Optane™ DC Persistent Memory
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-20
    Tony Mason; Thaleia Dimitra Doudali; Margo Seltzer; Ada Gavrilovska

    We evaluated Intel® Optane TM DC Persistent Memory and found that Intel's persistent memory is highly sensitive to data locality, size, and access patterns, which becomes clearer by optimizing both virtual memory page size and data layout for locality. Using the Polybench high-performance computing benchmark suite and controlling for mapped page size, we evaluate persistent meemory (PMEM) performance

  • Network Packet Processing Mode-Aware Power Management for Data Center Servers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-07-01
    Ki-Dong Kang; Gyeongseo Park; Nam Sung Kim; Daehoon Kim

    In data center servers, power management (PM) exploiting Dynamic Voltage and Frequency Scaling (DVFS) for processors can play a crucial role to improve energy efficiency. However, we observe that current PM policies (i.e., governors) not only considerably increase tail response time (i.e., violate a given Service Level Objective (SLO)) but also hurt energy efficiency. Tackling limitations of current

  • Brutus: Refuting the Security Claims of the Cache Timing Randomization Countermeasure Proposed in CEASER
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-01-06
    Rahul Bodduna; Vinod Ganesan; Patanjali SLPSK; Kamakoti Veezhinathan; Chester Rebeiro

    Cache timing attacks are a serious threat to the security of computing systems. It permits sensitive information, such as cryptographic keys, to leak across virtual machines and even to remote servers. Encrypted Address Cache, proposed by CEASER - a best paper candidate at MICRO 2018 - is a promising countermeasure that stymies the timing channel by employing cryptography to randomize the cache address

  • Towards Scalable Analytics with Inference-Enabled Solid-State Drives
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-07-23
    Minsub Kim; Jaeha Kung; Sungjin Lee

    In this paper, we propose a novel storage architecture, called an Inference-Enabled SSD (IESSD), which employs FPGA-based DNN inference accelerators inside an SSD. IESSD is capable of performing DNN operations inside an SSD, avoiding frequent data movements between application servers and data storage. This boosts up analytics performance of DNN applications. Moreover, by placing accelerators near

  • Challenges in Detecting an “Evasive Spectre”
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-02-24
    Congmiao Li; Jean-Luc Gaudiot

    Spectre attacks exploit serious vulnerabilities in modern CPU design to extract sensitive data through side channels. Completely fixing the problem would require a redesign of the architecture for conditional execution which cannot be backported. Researchers have proposed to detect Spectre with promising accuracy by monitoring deviations in microarchitectural events using existing hardware performance

  • Characterizing and Understanding GCNs on GPU
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-01-30
    Mingyu Yan; Zhaodong Chen; Lei Deng; Xiaochun Ye; Zhimin Zhang; Dongrui Fan; Yuan Xie

    Graph convolutional neural networks (GCNs) have achieved state-of-the-art performance on graph-structured data analysis. Like traditional neural networks, training and inference of GCNs are accelerated with GPUs. Therefore, characterizing and understanding the execution pattern of GCNs on GPU is important for both software and hardware optimization. Unfortunately, to the best of our knowledge, there

  • Post-Silicon Microarchitecture
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-09
    Chanchal Kumar; Aayush Chaudhary; Shubham Bhawalkar; Utkarsh Mathur; Saransh Jain; Adith Vastrad; Eric Rotenberg

    Microprocessors are designed to provide good general performance across a range of benchmarks. As such, microarchitectural techniques which provide good speedup for only a small subset of applications are not attractive when designing a general-purpose core. We propose coupling a reconfigurable fabric with the CPU, on the same chip, via a simple and flexible interface to allow post-silicon development

  • Breaking In-Order Branch Miss Recovery
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-13
    Stijn Eyerman; Wim Heirman; Sam Van den Steen; Ibrahim Hur

    Despite very accurate branch predictors, branch misses remain an important source of performance limiters, especially for irregular applications. To ensure in-order commit, branch miss recovery is done in-order: all instructions after the oldest branch miss are flushed, even if they eventually reconverge with the correct path. We propose a technique to limit flushing to real wrong-path instructions

  • Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-12
    Zhi-Gang Liu; Paul N. Whatmough; Matthew Mattina

    Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to accelerating GEMM, and widely deployed in industry. In this letter, we describe two significant improvements

  • A High-Performance Design of Generalized Pipeline Cellular Array
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-08
    Zhufei Chu; Huiming Tian; Zeqiang Li; Yinshui Xia; Lunyao Wang

    In this letter, we proposed a high-performance quantum-dot cellular automata (QCA) design of generalized pipeline cellular array (GPCA). The GPCA can perform all the basic arithmetic operations using only one arithmetic cell. Due to its flexibility, the high-performance GPCA design is of high interest for large-scale QCA designs. We proposed both the arithmetic unit and control unit designs of GPCA

  • Exploiting Thermal Transients With Deterministic Turbo Clock Frequency
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-30
    Pierre Michaud

    Modern microprocessors feature turbo mechanisms that adjust the clock frequency dynamically so as to maximize processor performance under power and temperature limits. However, the documentation for commercial chips rarely provides more than a superficial description of how turbo works. This letter highlights certains aspects of turbo that are not well known outside the industry and that distinguish

  • The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-16
    Srivatsan Krishnan; Zishen Wan; Kshitij Bhardwaj; Paul Whatmough; Aleksandra Faust; Gu-Yeon Wei; David Brooks; Vijay Janapa Reddi

    We introduce the “Formula-1” (F-1) roofline model to understand the role of computing in aerial autonomous machines. The model provides insights by exploiting the fundamental relationships between various components in an aerial robot, such as sensor framerate, compute performance, and body dynamics (physics). F-1 serves as a tool that can aid computer and cyber-physical system architects to understand

  • DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-02-14
    Shang Li; Zhiyuan Yang; Dhiraj Reddy; Ankur Srivastava; Bruce Jacob

    DRAM technology has developed rapidly in recent years. Several industrial solutions offer 3D packaging of DRAM and some are envisioning the integration of CPU and DRAM on the same die. These solutions allow higher density and better performance and also lower power consumption in DRAM designs. However, accurate simulation tools have not kept up with DRAM technology, especially for the modeling of 3D

  • Exploring Prefetching, Pre-Execution and Branch Outcome Streaming for In-Memory Database Lookups
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-12-16
    Mustafa Cavus; Mohammed Shatnawi; Resit Sendag; Augustus K. Uht

    Lookup operations for in-memory databases are heavily memory-bound because they often rely on pointer-chasing linked data structure traversals. They are also branch heavy with branches that are hard-to-predict due to random key lookups. In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions, only

Contents have been reproduced by permission of the publishers.
Springer 纳米技术权威期刊征稿
ACS ES&T Engineering
ACS ES&T Water