当前期刊: IEEE Computer Architecture Letters Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-28
    Nikita Lazarev; Neil Adit; Shaojie Xiang; Zhiru Zhang; Christina Delimitrou

    Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meet an application's end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-end latency and can introduce performance unpredictability and QoS violations. This letter presents our early work on Dagger , a hardware acceleration

    更新日期:2020-09-30
  • A Cross-Stack Approach Towards Defending Against Cryptojacking
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-18
    Nada Lachtar; Abdulrahman Abu Elkhail; Anys Bacha; Hafiz Malik

    Cryptocurrenices are revolutionizing the way we conduct every day business. Unfortunately, cybercriminals have harnessed this technology for making profit through cryptojacking, the act of maliciously appropriating computational resources for mining cryptocurrencies. In this letter, we explore a general solution for detecting cryptojacking attacks irrespective of the application type. We propose an

    更新日期:2020-09-25
  • Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-25
    Fatemeh Golshan; Mohammad Bakhshalipour; Mehran Shakerinava; Ali Ansari; Pejman Lotfi-Kamran; Hamid Sarbazi-Azad

    Recent research revisits pairwise-correlating data prefetching due to its extremely low overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where data streams end. As a result, pairwise-correlating data prefetchers either expose low accuracy or they lose timeliness when they are performing multi-degree prefetching. In this letter, we propose a novel technique to detect

    更新日期:2020-09-25
  • A Study of Memory Placement on Hardware-Assisted Tiered Memory Systems
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-08-11
    Wonkyo Choe; Jonghyeon Kim; Jeongseob Ahn

    Recent advances in memory technology, memory hierarchy is becoming diverse with performance-differentiated memory such as high bandwidth memory (HBM) and non-volatile memory (NVM) in modern computer systems. However, the current memory placement has been designed with the assumption that all the memory has the same capabilities based on DRAM. In this letter, we analyze memory placement schemes in state-of-the-art

    更新日期:2020-09-01
  • pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-23
    Purab Ranjan Sutradhar; Mark Connolly; Sathwika Bavikadi; Sai Manoj Pudukotai Dinakarrao; Mark A. Indovina; Amlan Ganguly

    Memory access latencies and low data transfer bandwidth limit the processing speed of many data intensive applications such as Convolutional Neural Networks (CNNs) in conventional Von Neumann architectures. Processing in Memory (PIM) is envisioned as a potential hardware solution for such applications as the data access bottlenecks can be avoided in PIM by performing computations within the memory

    更新日期:2020-08-08
  • SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-15
    Joo Hwan Lee; Hui Zhang; Veronica Lagrange; Praveen Krishnamoorthy; Xiaodong Zhao; Yang Seok Ki

    Faced with the increasing disparity between SSD throughput and CPU-based compute capabilities, there have been growing interests to move compute closer to storage and accelerate the data analytic workloads. In this letter, we propose SmartSSD, an SSD with onboard FPGA, which enables offloading computation within SSD. We perform a detailed model-based evaluation to evaluate the end-to-end performance

    更新日期:2020-08-04
  • MCsim: An Extensible DRAM Memory Controller Simulator
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-09
    Reza Mirosanlou; Danlu Guo; Mohamed Hassan; Rodolfo Pellizzoni

    Numerous proposals for memory controller (MC) designs have been exposed to the research community. Interest has since been growing in the area of computer architecture and real-time systems to improve the throughput of the system and/or guarantee timing requirements through novel scheduling algorithms. Consequently, comprehensive simulators are highly demanded since they provide an infrastructure for

    更新日期:2020-07-31
  • FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-29
    Jie Zhang; Miryeong Kwon; Sanghyun Han; Nam Sung Kim; Mahmut Kandemir; Myoungsoo Jung

    Host-side page victimizations can easily overflow the SSD internal buffer, which interferes I/O services of diverse user applications thereby degrading user-level experiences. To address this, we propose FastDrain, a co-design of OS kernel and flash firmware to avoid the buffer overflow, caused by page victimizations. Specifically, FastDrain can detect a triggering point where a near-future page victimization

    更新日期:2020-07-28
  • The Entangling Instruction Prefetcher
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-16
    Alberto Ros; Alexandra Jimborean

    Prefetching instructions is a fundamental technique for designing high-performance computers. There are three key properties to consider when designing an efficient and effective prefetcher: timeliness, coverage, and accuracy. Timeliness is an essential property, as bringing instructions too early increases the risk of the instructions being evicted from the cache before their use while requesting

    更新日期:2020-07-24
  • Value Locality Based Approximation With ODIN
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-15
    Rahul Singh; Gokul Subramanian Ravi; Mikko Lipasti; Joshua San Miguel

    Applications suited to approximation often exhibit significant value locality, both in terms of inputs as well as outcomes. In this early stage proposal - the ODIN: Outcome Driven Input Navigated approach to value locality based approximation, we hypothesize that value locality based optimizations for approximate applications should be driven by outcomes i.e., the result of the computation, but navigated

    更新日期:2020-07-24
  • Probability-Based Address Translationfor Flash SSDs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-07-02
    Junsu Im; Hanbyeol Kim; Yumin Won; Jiho Oh; Minjae Kim; Sungjin Lee

    Thanks to the advance of NAND scaling technologies, an ultra-scale SSD (e.g., $>$ 100 TB) is introduced to markets. This rapid increase of SSD capacity, however, comes at the cost of more DRAM which resides in an SSD controller for logical-to-physical (L2P) address translation. Many have proposed various address translation algorithms to reduce DRAM, but they fail to provide short read latency, in

    更新日期:2020-07-24
  • The Case for Domain-Specialized Branch Predictors for Graph-Processing
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-06-30
    Ahmed Samara; James Tuck

    Branch prediction is believed by many to be a solved problem, with state-of-the-art predictors achieving near-perfect prediction for many programs. In this article, we conduct a detailed simulation of graph-processing workloads in the GAPBS benchmark suite and show that branch mispredictions occur frequently and are still a large limitation on performance in key graph-processing applications. We provide

    更新日期:2020-07-24
  • A Two-Directional BigData Sorting Architecture on FPGAs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-07
    Bo-Cheng Lai; Chun-Yen Chen; Yi-Da Hsin; Bo-Yen Lin

    Sorting is pivotal data analytics and becomes challenging with intensive computation on drastically growing data volume. Sorting on FPGA has shown superior throughput, but the limited in-system memory causes vast data transferring to/from external storage when handling a large dataset. We propose a two-directional sorting (2DSort) architecture which sorts data sequences on both horizontal and vertical

    更新日期:2020-06-22
  • NMTSim: Transaction-Command Based Simulator for New Memory Technology Devices
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-18
    Peng Gu; Benjamin S. Lim; Wenqin Huangfu; Krishan T. Malladi; Andrew Chang; Yuan Xie

    To mitigate the impact of non-deterministic media access latencies in new memory technology devices, a recently proposed Non-Volatile Dual In-line Memory Module (NVDIMM) standard, NVDIMM-P uses novel out-of-order transaction commands. The previous DRAM simulators are unable to support this transaction protocol due to deterministic DDR timing. Also, existing NVDIMM simulators are customized for NAND

    更新日期:2020-05-18
  • HiLITE: Hierarchical and Lightweight Imitation Learning for Power Management of Embedded SoCs
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-13
    Anderson L. Sartor; Anish Krishnakumar; Samet E. Arda; Umit Y. Ogras; Radu Marculescu

    Modern systems-on-chip (SoCs) use dynamic power management (DPM) techniques to improve energy efficiency. However, existing techniques are unable to efficiently adapt the runtime decisions considering multiple objectives (e.g., energy and real-time requirements) simultaneously on heterogeneous platforms. To address this need, we propose HiLITE, a hierarchical imitation learning framework that maximizes

    更新日期:2020-05-13
  • Heterogeneous 3D Integration for a RISC-V System With STT-MRAM
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-05-06
    Lingjun Zhu; Lennart Bamberg; Anthony Agnesina; Francky Catthoor; Dragomir Milojevic; Manu Komalan; Julien Ryckaert; Alberto Garcia-Ortiz; Sung Kyu Lim

    Spin Torque Transfer Magnetic RAM (STT-MRAM) is a promising Non-Volatile Memory (NVM) technology achieving high density, low leakage power, and relatively small read/write delays. It provides a solution to improve the performance and to mitigate the leakage power consumption compared to SRAM-based processors. However, the process heterogeneity and the sophisticated back-end-of-line (BEOL) structure

    更新日期:2020-05-06
  • NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-27
    Seyyed Hossein SeyyedAghaei Rezaei; Mehdi Modarressi; Rachata Ausavarungnirun; Mohammad Sadrosadati; Onur Mutlu; Masoud Daneshtalab

    Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM chip and the processor chip. Some prior mechanisms propose to avoid this unnecessary data movement by using the shared internal bus in the DRAM chip to directly

    更新日期:2020-04-27
  • A Power-Aware Heterogeneous Architecture Scaling Model for Energy-Harvesting Computers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-24
    Harsh Desai; Brandon Lucia

    Energy-harvesting devices are the key to enabling future ubiquitous sensing applications, because they are long lived and require little maintenance. On-device processing of sensed data, such as images, avoids the high energy cost of communicating data to the edge or cloud. This letter observes that the on-device computing performance of an energy-harvesting system depends not only on execution time

    更新日期:2020-04-24
  • Architectural Implications of Graph Neural Networks
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-21
    Zhihui Zhang; Jingwen Leng; Lingxiao Ma; Youshan Miao; Chao Li; Minyi Guo

    Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures. It is becoming more and more popular due to its high accuracy achieved in many graph-related tasks. However, GNN is not as well understood in the system and architecture community as its counterparts such as multi-layer perceptrons and convolutional neural networks. This letter tries to

    更新日期:2020-04-21
  • Unexpected Performance of Intel® Optane™ DC Persistent Memory
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-20
    Tony Mason; Thaleia Dimitra Doudali; Margo Seltzer; Ada Gavrilovska

    We evaluated Intel® Optane TM DC Persistent Memory and found that Intel's persistent memory is highly sensitive to data locality, size, and access patterns, which becomes clearer by optimizing both virtual memory page size and data layout for locality. Using the Polybench high-performance computing benchmark suite and controlling for mapped page size, we evaluate persistent meemory (PMEM) performance

    更新日期:2020-04-20
  • Network Packet Processing Mode-Aware Power Management for Data Center Servers
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-07-01
    Ki-Dong Kang; Gyeongseo Park; Nam Sung Kim; Daehoon Kim

    In data center servers, power management (PM) exploiting Dynamic Voltage and Frequency Scaling (DVFS) for processors can play a crucial role to improve energy efficiency. However, we observe that current PM policies (i.e., governors) not only considerably increase tail response time (i.e., violate a given Service Level Objective (SLO)) but also hurt energy efficiency. Tackling limitations of current

    更新日期:2020-04-18
  • Brutus: Refuting the Security Claims of the Cache Timing Randomization Countermeasure Proposed in CEASER
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-01-06
    Rahul Bodduna; Vinod Ganesan; Patanjali SLPSK; Kamakoti Veezhinathan; Chester Rebeiro

    Cache timing attacks are a serious threat to the security of computing systems. It permits sensitive information, such as cryptographic keys, to leak across virtual machines and even to remote servers. Encrypted Address Cache, proposed by CEASER - a best paper candidate at MICRO 2018 - is a promising countermeasure that stymies the timing channel by employing cryptography to randomize the cache address

    更新日期:2020-04-18
  • Towards Scalable Analytics with Inference-Enabled Solid-State Drives
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-07-23
    Minsub Kim; Jaeha Kung; Sungjin Lee

    In this paper, we propose a novel storage architecture, called an Inference-Enabled SSD (IESSD), which employs FPGA-based DNN inference accelerators inside an SSD. IESSD is capable of performing DNN operations inside an SSD, avoiding frequent data movements between application servers and data storage. This boosts up analytics performance of DNN applications. Moreover, by placing accelerators near

    更新日期:2020-04-18
  • Challenges in Detecting an “Evasive Spectre”
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-02-24
    Congmiao Li; Jean-Luc Gaudiot

    Spectre attacks exploit serious vulnerabilities in modern CPU design to extract sensitive data through side channels. Completely fixing the problem would require a redesign of the architecture for conditional execution which cannot be backported. Researchers have proposed to detect Spectre with promising accuracy by monitoring deviations in microarchitectural events using existing hardware performance

    更新日期:2020-04-18
  • Characterizing and Understanding GCNs on GPU
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-01-30
    Mingyu Yan; Zhaodong Chen; Lei Deng; Xiaochun Ye; Zhimin Zhang; Dongrui Fan; Yuan Xie

    Graph convolutional neural networks (GCNs) have achieved state-of-the-art performance on graph-structured data analysis. Like traditional neural networks, training and inference of GCNs are accelerated with GPUs. Therefore, characterizing and understanding the execution pattern of GCNs on GPU is important for both software and hardware optimization. Unfortunately, to the best of our knowledge, there

    更新日期:2020-04-18
  • Post-Silicon Microarchitecture
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-09
    Chanchal Kumar; Aayush Chaudhary; Shubham Bhawalkar; Utkarsh Mathur; Saransh Jain; Adith Vastrad; Eric Rotenberg

    Microprocessors are designed to provide good general performance across a range of benchmarks. As such, microarchitectural techniques which provide good speedup for only a small subset of applications are not attractive when designing a general-purpose core. We propose coupling a reconfigurable fabric with the CPU, on the same chip, via a simple and flexible interface to allow post-silicon development

    更新日期:2020-04-18
  • Breaking In-Order Branch Miss Recovery
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-13
    Stijn Eyerman; Wim Heirman; Sam Van den Steen; Ibrahim Hur

    Despite very accurate branch predictors, branch misses remain an important source of performance limiters, especially for irregular applications. To ensure in-order commit, branch miss recovery is done in-order: all instructions after the oldest branch miss are flushed, even if they eventually reconverge with the correct path. We propose a technique to limit flushing to real wrong-path instructions

    更新日期:2020-04-18
  • Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-12
    Zhi-Gang Liu; Paul N. Whatmough; Matthew Mattina

    Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to accelerating GEMM, and widely deployed in industry. In this letter, we describe two significant improvements

    更新日期:2020-04-18
  • A High-Performance Design of Generalized Pipeline Cellular Array
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-04-08
    Zhufei Chu; Huiming Tian; Zeqiang Li; Yinshui Xia; Lunyao Wang

    In this letter, we proposed a high-performance quantum-dot cellular automata (QCA) design of generalized pipeline cellular array (GPCA). The GPCA can perform all the basic arithmetic operations using only one arithmetic cell. Due to its flexibility, the high-performance GPCA design is of high interest for large-scale QCA designs. We proposed both the arithmetic unit and control unit designs of GPCA

    更新日期:2020-04-08
  • Exploiting Thermal Transients With Deterministic Turbo Clock Frequency
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-30
    Pierre Michaud

    Modern microprocessors feature turbo mechanisms that adjust the clock frequency dynamically so as to maximize processor performance under power and temperature limits. However, the documentation for commercial chips rarely provides more than a superficial description of how turbo works. This letter highlights certains aspects of turbo that are not well known outside the industry and that distinguish

    更新日期:2020-03-30
  • The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-03-16
    Srivatsan Krishnan; Zishen Wan; Kshitij Bhardwaj; Paul Whatmough; Aleksandra Faust; Gu-Yeon Wei; David Brooks; Vijay Janapa Reddi

    We introduce the “Formula-1” (F-1) roofline model to understand the role of computing in aerial autonomous machines. The model provides insights by exploiting the fundamental relationships between various components in an aerial robot, such as sensor framerate, compute performance, and body dynamics (physics). F-1 serves as a tool that can aid computer and cyber-physical system architects to understand

    更新日期:2020-03-16
  • DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2020-02-14
    Shang Li; Zhiyuan Yang; Dhiraj Reddy; Ankur Srivastava; Bruce Jacob

    DRAM technology has developed rapidly in recent years. Several industrial solutions offer 3D packaging of DRAM and some are envisioning the integration of CPU and DRAM on the same die. These solutions allow higher density and better performance and also lower power consumption in DRAM designs. However, accurate simulation tools have not kept up with DRAM technology, especially for the modeling of 3D

    更新日期:2020-02-14
  • Exploring Prefetching, Pre-Execution and Branch Outcome Streaming for In-Memory Database Lookups
    IEEE Comput. Archit. Lett. (IF 1.109) Pub Date : 2019-12-16
    Mustafa Cavus; Mohammed Shatnawi; Resit Sendag; Augustus K. Uht

    Lookup operations for in-memory databases are heavily memory-bound because they often rely on pointer-chasing linked data structure traversals. They are also branch heavy with branches that are hard-to-predict due to random key lookups. In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions, only

    更新日期:2019-12-16
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
Springer 纳米技术权威期刊征稿
全球视野覆盖
施普林格·自然新
chemistry
3分钟学术视频演讲大赛
物理学研究前沿热点精选期刊推荐
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
屿渡论文,编辑服务
ACS Publications填问卷
阿拉丁试剂right
西北大学
大连理工大学
湖南大学
华东师范大学
王要兵
浙江大学
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
陆军军医大学
李霄鹏
廖矿标
试剂库存
down
wechat
bug