当前期刊: ACM Transactions on Architecture and Code Optimization Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Editorial: A Message from the Editor-in-Chief
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-03
    Dave Kaeli

    No abstract available.

    更新日期:2020-08-18
  • Zeroploit: Exploiting Zero Valued Operands in Interactive Gaming Applications
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-03
    Ram Rangan; Mark W. Stephenson; Aditya Ukarande; Shyam Murthy; Virat Agarwal; Marc Blackstein

    In this article, we first characterize register operand value locality in shader programs of modern gaming applications and observe that there is a high likelihood of one of the register operands of several multiply, logical-and, and similar operations being zero, dynamically. We provide intuition, examples, and a quantitative characterization for how zeros originate dynamically in these programs.

    更新日期:2020-08-18
  • GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-03
    Karel Adámek; Sofia Dimoudi; Mike Giles; Wesley Armour

    We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save

    更新日期:2020-08-18
  • FPDetect: Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-17
    Arnab Das; Sriram Krishnamoorthy; Ian Briggs; Ganesh Gopalakrishnan; Ramakrishna Tipireddy

    We present FPDetect, a low-overhead approach for detecting logical errors and soft errors affecting stencil computations without generating false positives. We develop an offline analysis that tightly estimates the number of floating-point bits preserved across stencil applications. This estimate rigorously bounds the values expected in the data space of the computation. Violations of this bound can

    更新日期:2020-08-18
  • Cooperative Software-hardware Acceleration of K-means on a Tightly Coupled CPU-FPGA System
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-17
    Tarek S. Abdelrahman

    We consider software-hardware acceleration of K-means clustering on the Intel Xeon+FPGA platform. We design a pipelined accelerator for K-means and combine it with CPU threads to assess performance benefits of (1) acceleration when data are only accessed from system memory and (2) cooperative CPU-FPGA acceleration. Our evaluation shows that the accelerator is up to 12.7×/2.4× faster than a single CPU

    更新日期:2020-08-18
  • Securing Branch Predictors with Two-Level Encryption
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-03
    Jaekyu Lee; Yasuo Ishii; Dam Sunwoo

    Modern processors rely on various speculative mechanisms to meet performance demand. Branch predictors are one of the most important micro-architecture components to deliver performance. However, they have been under heavy scrutiny because of recent side-channel attacks. Branch predictors are indexed using the PC and recent branch histories. An adversary can manipulate these parameters to access and

    更新日期:2020-08-18
  • EchoBay: Design and Optimization of Echo State Networks under Memory and Time Constraints
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-17
    L. Cerina; M. D. Santambrogio; G. Franco; C. Gallicchio; A. Micheli

    The increase in computational power of embedded devices and the latency demands of novel applications brought a paradigm shift on how and where the computation is performed. Although AI inference is slowly moving from the cloud to end-devices with limited resources, time-centric recurrent networks like Long-Short Term Memory remain too complex to be transferred on embedded devices without extreme simplifications

    更新日期:2020-08-18
  • Schedule Synthesis for Halide Pipelines on GPUs
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-03
    Savvas Sioutas; Sander Stuijk; Twan Basten; Henk Corporaal; Lou Somers

    The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization schedule. However, automatic schedule generation is currently only possible for multi-core CPU architectures. As a result, expert knowledge is still required when optimizing for platforms with

    更新日期:2020-08-18
  • Inter-kernel Reuse-aware Thread Block Scheduling
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-08-17
    Muhammad Huzaifa; Johnathan Alsop; Abdulrahman Mahmoud; Giordano Salvador; Matthew D. Sinclair; Sarita V. Adve

    As GPUs have become more programmable, their performance and energy benefits have made them increasingly popular. However, while GPU compute units continue to improve in performance, on-chip memories lag behind and data accesses are becoming increasingly expensive in performance and energy. Emerging GPU coherence protocols can mitigate this bottleneck by exploiting data reuse in GPU caches across kernel

    更新日期:2020-08-18
  • ArmorAll
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Charu Kalra; Fritz Previlon; Norm Rubin; David Kaeli

    The vulnerability of GPUs to soft errors has become a first-class design concern as they are increasingly being used in accuracy-sensitive and safety-critical domains. Existing solutions used to enhance the reliability of GPUs come with significant overhead in terms of area, power, and/or performance. In this article, we propose ArmorAll, a light-weight, adaptive, selective, and portable software solution

    更新日期:2020-05-29
  • Dynamic Precision Autotuning with TAFFO
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Stefano Cherubin; Daniele Cattaneo; Michele Chiari; Giovanni Agosta

    Many classes of applications, both in the embedded and high performance domains, can trade off the accuracy of the computed results for computation performance. One way to achieve such a trade-off is precision tuning—that is, to modify the data types used for the computation by reducing the bit width, or by changing the representation from floating point to fixed point. We present a methodology for

    更新日期:2020-05-29
  • Runtime Design Space Exploration and Mapping of DCNNs for the Ultra-Low-Power Orlando SoC
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Ahmet Erdem; Cristina Silvano; Thomas Boesch; Andrea Carlo Ornstein; Surinder-Pal Singh; Giuseppe Desoli

    Recent trends in deep convolutional neural networks (DCNNs) impose hardware accelerators as a viable solution for computer vision and speech recognition. The Orlando SoC architecture from STMicroelectronics targets exactly this class of problems by integrating hardware-accelerated convolutional blocks together with DSPs and on-chip memory resources to enable energy-efficient designs of DCNNs. The main

    更新日期:2020-05-29
  • Reliability Analysis for Unreliable FSM Computations
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Amir Hossein Nodehi Sabet; Junqiao Qiu; Zhijia Zhao; Sriram Krishnamoorthy

    Finite State Machines (FSMs) are fundamental in both hardware design and software development. However, the reliability of FSM computations remains poorly understood. Existing reliability analyses are mainly designed for generic computations and are unaware of the special error tolerance characteristics in FSM computations. This work introduces RelyFSM -- a state-level reliability analysis framework

    更新日期:2020-05-29
  • Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Jiachen Xue; T. N. Vijaykumar; Mithuna Thottethodi

    Remote Direct Memory Access (RDMA) fabrics such as InfiniBand and Converged Ethernet report latency shorter by a factor of 50 than TCP. As such, RDMA is a potential replacement for TCP in datacenters (DCs) running low-latency applications, such as Web search and memcached. InfiniBand’s Shared Receive Queues (SRQs), which use two-sided send/recv verbs (i.e., channel semantics), reduce the amount of

    更新日期:2020-05-29
  • A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Qinggang Wang; Long Zheng; Jieshan Zhao; Xiaofei Liao; Hai Jin; Jingling Xue

    FPGA-based graph processing accelerators are nowadays equipped with multiple pipelines for hardware acceleration of graph computations. However, their multi-pipeline efficiency can suffer greatly from the considerable overheads caused by the read/write conflicts in their on-chip BRAM from different pipelines, leading to significant performance degradation and poor scalability. In this article, we investigate

    更新日期:2020-05-29
  • SIMT-X
    ACM Trans. Archit. Code Optim. (IF 1.309) Pub Date : 2020-05-29
    Anita Tino; Caroline Collange; André Seznec

    This work introduces Single Instruction Multi-Thread Express (SIMT-X), a general-purpose Central Processing Unit (CPU) microarchitecture that enables Graphics Processing Units (GPUs)-style SIMT execution across multiple threads of the same program for high throughput, while retaining the latency benefits of out-of-order execution, and the programming convenience of homogeneous multi-thread processors

    更新日期:2020-05-29
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
物理学研究前沿热点精选期刊推荐
chemistry
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷
屿渡论文,编辑服务
阿拉丁试剂right
南昌大学
王辉
南方科技大学
彭小水
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
赵延川
李霄羽
廖矿标
朱守非
试剂库存
down
wechat
bug