当前期刊: arXiv - CS - Hardware Architecture Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • E-BATCH: Energy-Efficient and High-Throughput RNN Batching
    arXiv.cs.AR Pub Date : 2020-09-22
    Franyell Silfa; Jose Maria Arnau; Antonio Gonzalez

    Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require

    更新日期:2020-09-23
  • A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA
    arXiv.cs.AR Pub Date : 2020-09-22
    Alberto Parravicini; Francesco Sgherzi; Marco D. Santambrogio

    Sparse matrix-vector multiplication is often employed in many data-analytic workloads in which low latency and high throughput are more valuable than exact numerical convergence. FPGAs provide quick execution times while offering precise control over the accuracy of the results thanks to reduced-precision fixed-point arithmetic. In this work, we propose a novel streaming implementation of Coordinate

    更新日期:2020-09-23
  • A high-performance MEMRISTOR-based Smith-Waterman DNA sequence alignment Using FPNI structure
    arXiv.cs.AR Pub Date : 2020-09-21
    Mahdi Taheri; Hamed Zandevakili; Ali Mahani

    This paper aims to present a new re-configuration sequencing method for difference of read lengths that may take place as input data in which is crucial drawbacks lay impact on DNA sequencing methods.

    更新日期:2020-09-22
  • A Survey of Resource Management for Processing-in-Memory and Near-Memory Processing Architectures
    arXiv.cs.AR Pub Date : 2020-09-21
    Kamil Khan; Sudeep Pasricha; Ryan Gary Kim

    Due to amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become the bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers

    更新日期:2020-09-22
  • Tb/s Polar Successive Cancellation Decoder 16nm ASIC Implementation
    arXiv.cs.AR Pub Date : 2020-09-20
    Altuğ Süral; E. Göksu Sezer; Ertuğrul Kolağasıoğlu; Veerle Derudder; Kaoutar Bertrand

    This work presents an efficient ASIC implementation of successive cancellation (SC) decoder for polar codes. SC is a low-complexity depth-first search decoding algorithm, favorable for beyond-5G applications that require extremely high throughput and low power. The ASIC implementation of SC in this work exploits many techniques including pipelining and unrolling to achieve Tb/s data throughput without

    更新日期:2020-09-22
  • FlexWatts: A Power- and Workload-Aware Hybrid Power Delivery Network for Energy-Efficient Microprocessors
    arXiv.cs.AR Pub Date : 2020-09-18
    Jawad Haj-Yahya; Mohammed Alser; Jeremie S. Kim; Lois Orosa; Efraim Rotem; Avi Mendelson; Anupam Chattopadhyay; Onur Mutlu

    Modern client processors typically use one of three commonly-used power delivery network (PDN): 1) motherboard voltage regulators (MBVR), 2) integrated voltage regulators (IVR), and 3) low dropout voltage regulators (LDO). We observe that the energy-efficiency of each of these PDNs varies with the processor power (e.g., thermal design power (TDP) and dynamic power-state) and workload characteristics

    更新日期:2020-09-22
  • Open-Source Synthesizable Analog Blocks for High-Speed Link Designs: 20-GS/s 5b ENOB Analog-to-Digital Converter and 5-GHz Phase Interpolator
    arXiv.cs.AR Pub Date : 2020-09-18
    Sung-Jin Kim; Zachary Myers; Steven Herbst; ByongChan Lim; Mark Horowitz

    Using digital standard cells and digital place-and-route (PnR) tools, we created a 20 GS/s, 8-bit analog-to-digital converter (ADC) for use in high-speed serial link applications with an ENOB of 5.6, a DNL of 0.96 LSB, and an INL of 2.39 LSB, which dissipated 175 mW in 0.102 mm2 in a 16nm technology. The design is entirely described by HDL so that it can be ported to other processes with minimal effort

    更新日期:2020-09-22
  • Load Driven Branch Predictor (LDBP)
    arXiv.cs.AR Pub Date : 2020-09-18
    Akash Sridhar; Nursultan Kabylkas; Jose Renau

    Branch instructions dependent on hard-to-predict load data are the leading branch misprediction contributors. Current state-of-the-art history-based branch predictors have poor prediction accuracy for these branches. Prior research backs this observation by showing that increasing the size of a 256-KBit history-based branch predictor to its 1-MBit variant has just a 10% reduction in branch mispredictions

    更新日期:2020-09-22
  • Thermal and IR Drop Analysis Using Convolutional Encoder-Decoder Networks
    arXiv.cs.AR Pub Date : 2020-09-18
    Vidya A. Chhabria; Vipul Ahuja; Ashwath Prabhu; Nikhil Patil; Palkesh Jain; Sachin S. Sapatnekar

    Computationally expensive temperature and power grid analyses are required during the design cycle to guide IC design. This paper employs encoder-decoder based generative (EDGe) networks to map these analyses to fast and accurate image-to-image and sequence-to-sequence translation tasks. The network takes a power map as input and outputs the corresponding temperature or IR drop map. We propose two

    更新日期:2020-09-22
  • Long Range Communication on Batteryless Devices
    arXiv.cs.AR Pub Date : 2020-09-20
    Simeon Babatunde; Nirnay Jain; Vishwas Powar

    Bulk of the existing Wireless Sensor Network (WSN) nodes are usually battery powered, stationary and mostly designed for short distance communication, with little to no consideration for constrained devices that operate solely on harvested energy. On many occasions, batteries and beefy super-capacitors are used to power these WSN, but these systems are prone to service-life degradation and current-leakages

    更新日期:2020-09-22
  • Enabling Resource-Aware Mapping of Spiking Neural Networks via Spatial Decomposition
    arXiv.cs.AR Pub Date : 2020-09-19
    Adarsha Balaji; Shihao Song; Anup Das; Jeffrey Krichmar; Nikil Dutt; James Shackleford; Nagarajan Kandasamy; Francky Catthoor

    With growing model complexity, mapping Spiking Neural Network (SNN)-based applications to tile-based neuromorphic hardware is becoming increasingly challenging. This is because the synaptic storage resources on a tile, viz. a crossbar, can accommodate only a fixed number of pre-synaptic connections per post-synaptic neuron. For complex SNN models that have many pre-synaptic connections per neuron,

    更新日期:2020-09-22
  • MIRAGE: Mitigating Conflict-Based Cache Attacks with a Practical Fully-Associative Design
    arXiv.cs.AR Pub Date : 2020-09-18
    Gururaj Saileshwar; Moinuddin Qureshi

    Shared caches in modern processors are vulnerable to conflict-based attacks, whereby an attacker monitors the access pattern of a victim by engineering cache-set conflicts. Recent mitigations propose a randomized mapping of addresses to cache locations to obfuscate addresses that can conflict with a target address. Unfortunately, such designs continue to select eviction candidates from a small subset

    更新日期:2020-09-22
  • GrateTile: Efficient Sparse Tensor Tiling for CNN Processing
    arXiv.cs.AR Pub Date : 2020-09-18
    Yu-Sheng Lin; Hung Chang Lu; Yang-Bin Tsao; Yi-Min Chih; Wei-Chao Chen; Shao-Yi Chien

    We propose GrateTile, an efficient, hardwarefriendly data storage scheme for sparse CNN feature maps (activations). It divides data into uneven-sized subtensors and, with small indexing overhead, stores them in a compressed yet randomly accessible format. This design enables modern CNN accelerators to fetch and decompressed sub-tensors on-the-fly in a tiled processing manner. GrateTile is suitable

    更新日期:2020-09-21
  • Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer
    arXiv.cs.AR Pub Date : 2020-09-18
    Siyuan Lu; Meiqi Wang; Shuang Liang; Jun Lin; Zhongfeng Wang

    Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing accelerators are built for either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). Recently, the Transformer model is replacing the RNN in the natural language processing (NLP) area. However, because of intensive matrix computations and complicated data

    更新日期:2020-09-21
  • FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching
    arXiv.cs.AR Pub Date : 2020-09-17
    Yaohua Wang; Lois Orosa; Xiangjun Peng; Yang Guo; Saugata Ghose; Minesh Patel; Jeremie S. Kim; Juan Gómez Luna; Mohammad Sadrosadati; Nika Mansouri Ghiasi; Onur Mutlu

    DRAM Main memory is a performance bottleneck for many applications due to the high access latency. In-DRAM caches work to mitigate this latency by augmenting regular-latency DRAM with small-but-fast regions of DRAM that serve as a cache for the data held in the regular-latency region of DRAM. While an effective in-DRAM cache can allow a large fraction of memory requests to be served from a fast DRAM

    更新日期:2020-09-20
  • NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling
    arXiv.cs.AR Pub Date : 2020-09-17
    Gagandeep Singh; Dionysios Diamantopoulos; Christoph Hagleitner; Juan Gomez-Luna; Sander Stuijk; Onur Mutlu; Henk Corporaal

    Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration

    更新日期:2020-09-20
  • Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics
    arXiv.cs.AR Pub Date : 2020-09-17
    Minesh Patel; Jeremie S. Kim; Taha Shahroodi; Hasan Hassan; Onur Mutlu

    Increasing single-cell DRAM error rates have pushed DRAM manufacturers to adopt on-die error-correction coding (ECC), which operates entirely within a DRAM chip to improve factory yield. The on-die ECC function and its effects on DRAM reliability are considered trade secrets, so only the manufacturer knows precisely how on-die ECC alters the externally-visible reliability characteristics. Consequently

    更新日期:2020-09-20
  • New Models for Understanding and Reasoning about Speculative Execution Attacks
    arXiv.cs.AR Pub Date : 2020-09-17
    Zecheng He; Guangyuan Hu; Ruby Lee

    Spectre and Meltdown attacks and their variants exploit performance optimization features to cause security breaches. Secret information is accessed and leaked through micro-architectural covert channels. New attack variants keep appearing and we do not have a systematic way to capture the critical characteristics of these attacks and evaluate why they succeed. In this paper, we provide a new attack-graph

    更新日期:2020-09-20
  • Probabilistic Value-Deviation-Bounded Source-Dependent Bit-Level Channel Adaptation for Approximate Communication
    arXiv.cs.AR Pub Date : 2020-09-16
    Bilgesu Arif Bilgin; Phillip Stanley-Marbell

    Computing systems that can tolerate effects of errors in their communicated data values can trade this tolerance for improved resource efficiency. Many important applications of computing, such as embedded sensor systems, can tolerate errors that are bounded in their distribution of deviation from correctness (distortion). We present a channel adaptation technique which modulates properties of I/O

    更新日期:2020-09-20
  • Enabling Virtual Memory Research on RISC-V with a Configurable TLB Hierarchy for the Rocket Chip Generator
    arXiv.cs.AR Pub Date : 2020-09-16
    Nikolaos Charalampos Papadopoulos; Vasileios Karakostas; Konstantinos Nikas; Nectarios Koziris; Dionisios N. Pnevmatikatos

    The Rocket Chip Generator uses a collection of parameterized processor components to produce RISC-V-based SoCs. It is a powerful tool that can produce a wide variety of processor designs ranging from tiny embedded processors to complex multi-core systems. In this paper we extend the features of the Memory Management Unit of the Rocket Chip Generator and specifically the TLB hierarchy. TLBs are essential

    更新日期:2020-09-18
  • GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis
    arXiv.cs.AR Pub Date : 2020-09-16
    Damla Senol Cali; Gurpreet S. Kalsi; Zülal Bingöl; Can Firtina; Lavanya Subramanian; Jeremie S. Kim; Rachata Ausavarungnirun; Mohammed Alser; Juan Gomez-Luna; Amirali Boroumand; Anant Nori; Allison Scibisz; Sreenivas Subramoney; Can Alkan; Saugata Ghose; Onur Mutlu

    Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major contributor

    更新日期:2020-09-18
  • SideLine: How Delay-Lines (May) Leak Secrets from your SoC
    arXiv.cs.AR Pub Date : 2020-09-16
    Joseph Gravellier; Jean-Max Dutertre; Yannick Teglia; Philippe Loubet Moundi

    To meet the ever-growing need for performance in silicon devices, SoC providers have been increasingly relying on software-hardware cooperation. By controlling hardware resources such as power or clock management from the software, developers earn the possibility to build more flexible and power efficient applications. Despite the benefits, these hardware components are now exposed to software code

    更新日期:2020-09-18
  • Analog vs. Digital Spatial Transforms: A Throughput, Power, and Area Comparison
    arXiv.cs.AR Pub Date : 2020-09-15
    Zephan M. Enciso; Seyed Hadi Mirfarshbafan; Oscar Castañeda; Clemens JS. Schaefer; Christoph Studer; Siddharth Joshi

    Spatial linear transforms that process multiple parallel analog signals to simplify downstream signal processing find widespread use in multi-antenna communication systems, machine learning inference, data compression, audio and ultrasound applications, among many others. In the past, a wide range of mixed-signal as well as digital spatial transform circuits have been proposed---it is, however, a longstanding

    更新日期:2020-09-18
  • Secure Internal Communication of a Trustzone-Enabled Heterogeneous Soc Lightweight Encryption
    arXiv.cs.AR Pub Date : 2020-09-15
    El Mehdi BenhaniLHC; Cuauhtemoc Mancillas LopezCINVESTAV-IPN; Lilian BossuetLHC

    Security in TrustZone-enabled heterogeneous system-on-chip (SoC) is gaining increasing attention for several years. Mainly because this type of SoC can be found in more and more applications in servers or in the cloud. The inside-SoC communication layer is one of the main element of heterogeneous SoC; indeed all the data goes through it. Monitoring and controlling inside-SoC communications enables

    更新日期:2020-09-16
  • The Cost of Software-Based Memory Management Without Virtual Memory
    arXiv.cs.AR Pub Date : 2020-09-14
    Drew Zagieboylo; G. Edward Suh; Andrew C. Myers

    Virtual memory has been a standard hardware feature for more than three decades. At the price of increased hardware complexity, it has simplified software and promised strong isolation among colocated processes. In modern computing systems, however, the costs of virtual memory have increased significantly. With large memory workloads, virtualized environments, data center computing, and chips with

    更新日期:2020-09-16
  • A Systematic Study of Lattice-based NIST PQC Algorithms: from Reference Implementations to Hardware Accelerators
    arXiv.cs.AR Pub Date : 2020-09-15
    Malik Imran; Zain Ul Abideen; Samuel Pagliarini

    Security of currently deployed public key cryptography algorithms is foreseen to be vulnerable against quantum computer attacks. Hence, a community effort exists to develop post-quantum cryptography (PQC) algorithms, i.e., algorithms that are resistant to quantum attacks. In this work, we have investigated how lattice-based candidate algorithms from the NIST PQC standardization competition fare when

    更新日期:2020-09-16
  • The Hardware Lottery
    arXiv.cs.AR Pub Date : 2020-09-14
    Sara Hooker

    Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is odd given that hardware and software have frequently determined which research ideas succeed (and fail). This essay introduces the term hardware lottery to describe when a research idea wins because it

    更新日期:2020-09-15
  • DANCE: Differentiable Accelerator/Network Co-Exploration
    arXiv.cs.AR Pub Date : 2020-09-14
    Kanghyun Choi; Deokki Hong; Hojae Yoon; Joonsang Yu; Youngsok Kim; Jinho Lee

    To cope with the ever-increasing computational demand of the DNN execution, recent neural architecture search (NAS) algorithms consider hardware cost metrics into account, such as GPU latency. To further pursue a fast, efficient execution, DNN-specialized hardware accelerators are being designed for multiple purposes, which far-exceeds the efficiency of the GPUs. However, those hardware-related metrics

    更新日期:2020-09-15
  • AutoML for Multilayer Perceptron and FPGA Co-design
    arXiv.cs.AR Pub Date : 2020-09-14
    Philip Colangelo; Oren Segal; Alex Speicher; Martin Margala

    State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutomML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research

    更新日期:2020-09-15
  • A Survey of FPGA-Based Robotic Computing
    arXiv.cs.AR Pub Date : 2020-09-13
    Zishen Wan; Bo Yu; Thomas Yuang Li; Jie Tang; Yuhao Zhu; Yu Wang; Arijit Raychowdhury; Shaoshan Liu

    Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges to its applications. On the one hand, CPU platform

    更新日期:2020-09-15
  • An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication
    arXiv.cs.AR Pub Date : 2020-09-11
    Andreas Kurth; Wolfgang Rönninger; Thomas Benz; Matheus Cavalcante; Fabian Schuiki; Florian Zaruba; Luca Benini

    On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it continues to gain importance as the number of cores, the heterogeneity of components, and the on-chip and off-chip bandwidth continue to grow. Decades of research on on-chip networks enabled cache-coherent shared-memory multiprocessors. However, communication fabrics that meet the needs of heterogeneous

    更新日期:2020-09-14
  • DMR-based Technique for Fault Tolerant AES S-box Architecture
    arXiv.cs.AR Pub Date : 2020-09-11
    Mahdi Taheri; Saeideh Sheikhpour; Mohammad Saeed Ansari; Ali Mahani

    This paper presents a high-throughput fault-resilient hardware implementation of AES S-box, called HFS-box. If a transient natural or even malicious fault in each pipeline stage is detected, the corresponding error signal becomes high and as a result, the control unit holds the output of our proposed DMR voter till the fault effect disappears. The proposed low-cost HFS-box provides a high capability

    更新日期:2020-09-14
  • Accelerating Recommender Systems via Hardware "scale-in"
    arXiv.cs.AR Pub Date : 2020-09-11
    Suresh Krishna; Ravi Krishna

    In today's era of "scale-out", this paper makes the case that a specialized hardware architecture based on "scale-in"--placing as many specialized processors as possible along with their memory systems and interconnect links within one or two boards in a rack--would offer the potential to boost large recommender system throughput by 12-62x for inference and 12-45x for training compared to the DGX-2

    更新日期:2020-09-14
  • MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing
    arXiv.cs.AR Pub Date : 2020-09-10
    Gokul Subramanian Ravi; Ramon Bertran; Pradip Bose; Mikko Lipasti

    We present MicroGrad, a centralized automated framework that is able to efficiently analyze the capabilities, limits and sensitivities of complex modern processors in the face of constantly evolving application domains. MicroGrad uses Microprobe, a flexible code generation framework as its back-end and a Gradient Descent based tuning mechanism to efficiently enable the evolution of the test cases to

    更新日期:2020-09-11
  • Development of a Predictive Process Design kit for15-nm FinFETs: FreePDK15
    arXiv.cs.AR Pub Date : 2020-09-09
    Kirti Bhanushali; Chinmay Tembe; W. Rhett Davis

    FinFETs are predicted to advance semiconductorscaling for sub-20nm devices. In order to support their intro-duction into research and universities it is crucial to develop anopen source predictive process design kit. This paper discussesin detail the design process for such a kit for 15nm FinFETdevices, called the FreePDK15. The kit consists of a layerstack with thirteen-metal layers based on hier

    更新日期:2020-09-11
  • Time-Based Roofline for Deep Learning Performance Analysis
    arXiv.cs.AR Pub Date : 2020-09-09
    Yunsong Wang; Charlene Yang; Steven Farrell; Yan Zhang; Thorsten Kurth; Samuel Williams

    Deep learning applications are usually very compute-intensive and require a long runtime for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based approach to performance analysis to facilitate the optimization of these applications. This approach is an extension of the Roofline model widely used in traditional

    更新日期:2020-09-11
  • Asymmetric Aging Effect on Modern Microprocessors
    arXiv.cs.AR Pub Date : 2020-09-08
    Freddy Gabbay; Avi Mendelson

    Reliability is a crucial requirement in any modern microprocessor to assure correct execution over its lifetime. As mission critical components are becoming common in commodity systems; e.g., control of autonomous cars, the demand for reliable processing has even further heightened. Latest process technologies even worsened the situation; thus, microprocessors design has become highly susceptible to

    更新日期:2020-09-10
  • Going Deep: Using deep learning techniques with simplified mathematical models against XOR BR and TBR PUFs (Attacks and Countermeasures)
    arXiv.cs.AR Pub Date : 2020-09-09
    Mahmoud Khalafalla; Mahmoud A. Elmohr; Catherine Gebotys

    This paper contributes to the study of PUFs vulnerability against modeling attacks by evaluating the security of XOR BR PUFs, XOR TBR PUFs, and obfuscated architectures of XOR BR PUF using a simplified mathematical model and deep learning (DL) techniques. Obtained results show that DL modeling attacks could easily break the security of 4-input XOR BR PUFs and 4-input XOR TBR PUFs with modeling accuracy

    更新日期:2020-09-10
  • Quad-Core RSA Processor with Countermeasure Against Power Analysis Attacks
    arXiv.cs.AR Pub Date : 2020-09-08
    Javad Bagherzadeh; Vishishtha Bothra; Disha Gujar; Sugandha Gupta; Jinal Shah

    Rivest-Shamir-Adleman (RSA) cryptosystem uses modular multiplication for encryption and decryption. So, performance of RSA can be drastically improved by optimizing modular multiplication. This paper proposes a new parallel, high-radix Montgomery multiplier for 1024 bits multi-core RSA processor. Each computation step operates in radix 4. The computation speed is increased by more than 4 times. We

    更新日期:2020-09-10
  • High-Bandwidth Spatial Equalization for mmWave Massive MU-MIMO with Processing-In-Memory
    arXiv.cs.AR Pub Date : 2020-09-08
    Oscar Castañeda; Sven Jacobsson; Giuseppe Durisi; Tom Goldstein; Christoph Studer

    All-digital basestation (BS) architectures enable superior spectral efficiency compared to hybrid solutions in massive multi-user MIMO systems. However, supporting large bandwidths with all-digital architectures at mmWave frequencies is challenging as traditional baseband processing would result in excessively high power consumption and large silicon area. The recently-proposed concept of finite-alphabet

    更新日期:2020-09-10
  • On Architecture to Architecture Mapping for Concurrency
    arXiv.cs.AR Pub Date : 2020-09-08
    Soham Chakraborty

    Mapping programs from one architecture to another plays a key role in technologies such as binary translation, decompilation, emulation, virtualization, and application migration. Although multicore architectures are ubiquitous, the state-of-the-art translation tools do not handle concurrency primitives correctly. Doing so is rather challenging because of the subtle differences in the concurrency models

    更新日期:2020-09-10
  • Exploiting Extended Krylov Subspace for the Reduction of Regular and Singular Circuit Models
    arXiv.cs.AR Pub Date : 2020-07-03
    Chrysostomos Chatzigeorgiou; Dimitrios Garyfallou; George Floros; Nestor Evmorfopoulos; George Stamoulis

    During the past decade, Model Order Reduction (MOR) has become key enabler for the efficient simulation of large circuit models. MOR techniques based on moment-matching are well established due to their simplicity and computational performance in the reduction process. However, moment-matching methods based on the ordinary Krylov subspace are usually inadequate to accurately approximate the original

    更新日期:2020-09-10
  • PolyAdd: Polynomial Formal Verification of Adder Circuits
    arXiv.cs.AR Pub Date : 2020-09-07
    Rolf Drechsler

    Only by formal verification approaches functional correctness can be ensured. While for many circuits fast verification is possible, in other cases the approaches fail. In general no efficient algorithms can be given, since the underlying verification problem is NP-complete. In this paper we prove that for different types of adder circuits polynomial verification can be ensured based on BDDs. While

    更新日期:2020-09-08
  • Critical Business Decision Making for Technology Startups -- A PerceptIn Case Study
    arXiv.cs.AR Pub Date : 2020-09-07
    Shaoshan Liu

    Most business decisions are made with analysis, but some are judgment calls not susceptible to analysis due to time or information constraints. In this article, we present a real-life case study of critical business decision making of PerceptIn, an autonomous driving technology startup. In early years of PerceptIn, PerceptIn had to make a decision on the design of computing systems for its autonomous

    更新日期:2020-09-08
  • Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration
    arXiv.cs.AR Pub Date : 2020-09-04
    Zhi-Gang Liu; Paul N. Whatmough; Matthew Mattina

    Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). Exploiting data sparsity is a common approach to further accelerate GEMM for CNN inference, and in particular, structural sparsity has the advantages of predictable load balancing and very low index overhead. In this paper, we address a

    更新日期:2020-09-08
  • A Class of Optimal Structures for Node Computations in Message Passing Algorithms
    arXiv.cs.AR Pub Date : 2020-09-05
    Xuan He; Kui Cai; Liang Zhou

    Consider the computations at a node in the message passing algorithms. Assume that the node has incoming and outgoing messages $\mathbf{x} = (x_1, x_2, \ldots, x_n)$ and $\mathbf{y} = (y_1, y_2, \ldots, y_n)$, respectively. In this paper, we investigate a class of structures that can be adopted by the node for computing $\mathbf{y}$ from $\mathbf{x}$, where each $y_j, j = 1, 2, \ldots, n$ is computed

    更新日期:2020-09-08
  • Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs
    arXiv.cs.AR Pub Date : 2020-09-05
    Charlene Yang

    This paper surveys a range of methods to collect necessary performance data on Intel CPUs and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have integrated Roofline analysis into their supported feature set. This paper fills the gap for when these tools are not available, or when users would like a more customized

    更新日期:2020-09-08
  • 2.5D Root of Trust: Secure System-Level Integration of Untrusted Chiplets
    arXiv.cs.AR Pub Date : 2020-09-04
    Mohammed Nabeel; Mohammed Ashraf; Satwik Patnaik; Vassos Soteriou; Ozgur Sinanoglu; Johann Knechtel

    For the first time, we leverage the 2.5D interposer technology to establish system-level security in the face of hardware- and software-centric adversaries. More specifically, we integrate chiplets (i.e., third-party hard intellectual property of complex functionality, like microprocessors) using a security-enforcing interposer. Such hardware organization provides a robust 2.5D root of trust for trustworthy

    更新日期:2020-09-08
  • CLEANN: Accelerated Trojan Shield for Embedded Neural Networks
    arXiv.cs.AR Pub Date : 2020-09-04
    Mojan Javaheripi; Mohammad Samragh; Gregory Fields; Tara Javidi; Farinaz Koushanfar

    We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the ground-truth

    更新日期:2020-09-08
  • ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
    arXiv.cs.AR Pub Date : 2020-09-04
    Sheng-Chun Kao; Geonhwa Jeong; Tushar Krishna

    DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs during the DNN computations to reduce data movement from DRAM to the chip. The reuse is captured by the accelerator's dataflow. While there has been significant prior work in exploring and comparing various dataflows, the strategy for assigning on-chip hardware resources (i.e., compute and memory) given a dataflow

    更新日期:2020-09-08
  • Virtualized Logical Qubits: A 2.5D Architecture for Error-Corrected Quantum Computing
    arXiv.cs.AR Pub Date : 2020-09-04
    Casey Duckering; Jonathan M. Baker; David I. Schuster; Frederic T. Chong

    Current, near-term quantum devices have shown great progress in recent years culminating with a demonstration of quantum supremacy. In the medium-term, however, quantum machines will need to transition to greater reliability through error correction, likely through promising techniques such as surface codes which are well suited for near-term devices with limited qubit connectivity. We discover quantum

    更新日期:2020-09-08
  • Scalable Light-Weight Integration of FPGA Based Accelerators with Chip Multi-Processors
    arXiv.cs.AR Pub Date : 2020-09-03
    Zhe Lin; Sharad Sinha; Hao Liang; Liang Feng; Wei Zhang

    Modern multicore systems are migrating from homogeneous systems to heterogeneous systems with accelerator-based computing in order to overcome the barriers of performance and power walls. In this trend, FPGA-based accelerators are becoming increasingly attractive, due to their excellent flexibility and low design cost. In this paper, we propose the architectural support for efficient interfacing between

    更新日期:2020-09-05
  • Decision Tree Based Hardware Power Monitoring for Run Time Dynamic Power Management in FPGA
    arXiv.cs.AR Pub Date : 2020-09-03
    Zhe Lin; Wei Zhang; Sharad Sinha

    Fine-grained runtime power management techniques could be promising solutions for power reduction. Therefore, it is essential to establish accurate power monitoring schemes to obtain dynamic power variation in a short period (i.e., tens or hundreds of clock cycles). In this paper, we leverage a decision-tree-based power modeling approach to establish fine-grained hardware power monitoring on FPGA platforms

    更新日期:2020-09-05
  • An Ensemble Learning Approach for In-situ Monitoring of FPGA Dynamic Power
    arXiv.cs.AR Pub Date : 2020-09-03
    Zhe Lin; Sharad Sinha; Wei Zhang

    As field-programmable gate arrays become prevalent in critical application domains, their power consumption is of high concern. In this paper, we present and evaluate a power monitoring scheme capable of accurately estimating the runtime dynamic power of FPGAs in a fine-grained timescale, in order to support emerging power management techniques. In particular, we describe a novel and specialized ensemble

    更新日期:2020-09-05
  • Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors
    arXiv.cs.AR Pub Date : 2020-09-03
    Duy Thanh Nguyen; Hyun Kim; Hyuk-Jae Lee

    Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are frequently quite different, previous hardware designs have employed common optimization schemes for them. This paper proposes a layer-specific design that employs different

    更新日期:2020-09-05
  • Agile SoC Development with Open ESP
    arXiv.cs.AR Pub Date : 2020-09-02
    Paolo Mantovani; Davide Giri; Giuseppe Di Guglielmo; Luca Piccolboni; Joseph Zuckerman; Emilio G. Cota; Michele Petracca; Christian Pilato; Luca P. Carloni

    ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level design

    更新日期:2020-09-03
  • CONTRA: Area-Constrained Technology Mapping Framework For Memristive Memory Processing Unit
    arXiv.cs.AR Pub Date : 2020-09-02
    Debjyoti Bhattacharjee; Anupam Chattopadhyay; Srijit Dutta; Ronny Ronen; Shahar Kvatinsky

    Data-intensive applications are poised to benefit directly from processing-in-memory platforms, such as memristive Memory Processing Units, which allow leveraging data locality and performing stateful logic operations. Developing design automation flows for such platforms is a challenging and highly relevant research problem. In this work, we investigate the problem of minimizing delay under arbitrary

    更新日期:2020-09-03
  • HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis
    arXiv.cs.AR Pub Date : 2020-09-02
    Zhe Lin; Jieru Zhao; Sharad Sinha; Wei Zhang

    High-level synthesis (HLS) enables designers to customize hardware designs efficiently. However, it is still challenging to foresee the correlation between power consumption and HLS-based applications at an early design stage. To overcome this problem, we introduce HL-Pow, a power modeling framework for FPGA HLS based on state-of-the-art machine learning techniques. HL-Pow incorporates an automated

    更新日期:2020-09-03
  • Architectural Implications of Graph Neural Networks
    arXiv.cs.AR Pub Date : 2020-09-02
    Zhihui Zhang; Jingwen Leng; Lingxiao Ma; Youshan Miao; Chao Li; Minyi Guo

    Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures. It is becoming more and more popular due to its high accuracy achieved in many graph-related tasks. However, GNN is not as well understood in the system and architecture community as its counterparts such as multi-layer perceptrons and convolutional neural networks. This work tries to introduce

    更新日期:2020-09-03
  • TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
    arXiv.cs.AR Pub Date : 2020-09-01
    Mostafa Mahmoud; Isak Edo; Ali Hadi Zadeh; Omar Mohamed Awad; Gennady Pekhimenko; Jorge Albericio; Andreas Moshovos

    TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost, sparse input operand interconnect comprising an 8-input multiplexer per multiplier

    更新日期:2020-09-03
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
物理学研究前沿热点精选期刊推荐
chemistry
《自然》编辑与您分享如何成为优质审稿人-信息流
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
自然职场线上招聘会
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷
屿渡论文,编辑服务
阿拉丁试剂right
南昌大学
王辉
南方科技大学
刘天飞
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
X-MOL
苏州大学
廖矿标
深圳湾
试剂库存
down
wechat
bug