当前期刊: arXiv - CS - Hardware Architecture Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Verification and Design Methods for the BrainScaleS Neuromorphic Hardware System
    arXiv.cs.AR Pub Date : 2020-03-25
    Andreas Grübl; Sebastian Billaudelle; Benjamin Cramer; Vitali Karasenko; Johannes Schemmel

    This paper presents verification and implementation methods that have been developed for the design of the BrainScaleS-2 65nm ASICs. The 2nd generation BrainScaleS chips are mixed-signal devices with tight coupling between full-custom analog neuromorphic circuits and two general purpose microprocessors (PPU) with SIMD extension for on-chip learning and plasticity. Simulation methods for automated analysis

    更新日期:2020-03-26
  • Soft-Error and Hard-fault Tolerant Architecture and Routing Algorithm for Reliable 3D-NoC Systems
    arXiv.cs.AR Pub Date : 2020-03-21
    Khanh N. Dang; Yuichi Okuyama; Abderazek Ben Abdallah

    Network-on-Chip (NoC) paradigm has been proposed as an auspicious solution to handle the strict communication requirements between the increasingly large number of cores on a single multi and many-core chips. However, NoC systems are exposed to a variety of manufacturing, design and energetic particles factors making them vulnerable to permanent (hard) faults and transient (soft) errors. In this paper

    更新日期:2020-03-24
  • A 75kb SRAM in 65nm CMOS for In-Memory Computing Based Neuromorphic Image Denoising
    arXiv.cs.AR Pub Date : 2020-03-23
    Sumon Kumar Bose; Vivek Mohan; Arindam Basu

    This paper presents an in-memory computing (IMC) architecture for image denoising. The proposed SRAM based in-memory processing framework works in tandem with approximate computing on a binary image generated from neuromorphic vision sensors. Implemented in TSMC 65nm process, the proposed architecture enables approximately 2000X energy savings (approximately 222X from IMC) compared to a digital implementation

    更新日期:2020-03-24
  • Reliability Assessment and Quantitative Evaluation of Soft-Error Resilient 3D Network-on-Chip Systems
    arXiv.cs.AR Pub Date : 2020-03-21
    Khanh N Dang; Michael Meyer; Yuichi Okuyama; Abderazek Ben Abdallah

    Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an auspicious solution, merging the high parallelism of the Network-on-Chip (NoC) paradigm with the high-performance and low-power cost of 3D-ICs. However, as technology scales down, the reliability issues are becoming more crucial, especially for complex 3D-NoC which provides the communication requirements of multi and many-core systems-on-chip

    更新日期:2020-03-24
  • Report on power, thermal and reliability prediction for 3D Networks-on-Chip
    arXiv.cs.AR Pub Date : 2020-03-19
    Khanh N. Dang; Akram Ben Ahmed; Abderazek Ben Abdallah; Xuan-Tu Tran

    By combining Three Dimensional Integrated Circuits with the Network-on-Chip infrastructure to obtain 3D Networks-on-Chip (3D-NoCs), the new on-chip communication paradigm brings several advantages on lower power, smaller footprint and lower latency. However, thermal dissipation is one of the most critical challenges for 3D-ICs where the heat cannot easily transfer through several layers of silicon

    更新日期:2020-03-20
  • Physical Time-Varying Transfer Functions as Generic Low-Overhead Power-SCA Countermeasure
    arXiv.cs.AR Pub Date : 2020-03-16
    Archisman Ghosh; Debayan Das; Shreyas Sen

    Mathematically-secure cryptographic algorithms leak significant side channel information through their power supplies when implemented on a physical platform. These side channel leakages can be exploited by an attacker to extract the secret key of an embedded device. The existing state-of-the-art countermeasures mainly focus on the power balancing, gate-level masking, or signal-to-noise (SNR) reduction

    更新日期:2020-03-18
  • A Power-Efficient Binary-Weight Spiking Neural Network Architecture for Real-Time Object Classification
    arXiv.cs.AR Pub Date : 2020-03-12
    Pai-Yu Tan; Po-Yao Chuang; Yen-Ting Lin; Cheng-Wen Wu; Juin-Ming Lu

    Neural network hardware is considered an essential part of future edge devices. In this paper, we propose a binary-weight spiking neural network (BW-SNN) hardware architecture for low-power real-time object classification on edge platforms. This design stores a full neural network on-chip, and hence requires no off-chip bandwidth. The proposed systolic array maximizes data reuse for a typical convolutional

    更新日期:2020-03-16
  • Cycle-Accurate Evaluation of Software-Hardware Co-Design of Decimal Computation in RISC-V Ecosystem
    arXiv.cs.AR Pub Date : 2020-03-11
    Riaz-ul-haque Mian; Michihiro Shintani; Michiko Inoue

    Software-hardware co-design solutions for decimal computation can provide several Pareto points to development of embedded systems in terms of hardware cost and performance. This paper demonstrates how to accurately evaluate such co-design solutions using RISC-V ecosystem. In a software-hardware co-design solution, a part of solution requires dedicated hardware. In our evaluation framework, we develop

    更新日期:2020-03-12
  • SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs
    arXiv.cs.AR Pub Date : 2019-10-20
    Mohammed Alser; Taha Shahroodi; Juan Gomez-Luna; Can Alkan; Onur Mutlu

    Motivation: We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for the computationally costly sequence alignment step. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in only finding the optimal path

    更新日期:2020-03-12
  • LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations
    arXiv.cs.AR Pub Date : 2020-03-06
    SeyedRamin Rasoulinezhad; Siddhartha; Hao Zhou; Lingli Wang; David Boland; Philip H. W. Leong

    We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and

    更新日期:2020-03-09
  • A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision
    arXiv.cs.AR Pub Date : 2020-03-02
    Hongjie Wang; Yang Zhao; Chaojian Li; Yue Wang; Yingyan Lin

    The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns. To reduce the dominant data movement cost of training, process in-memory (PIM) has emerged as a promising solution as it alleviates the need to access DNN weights. However, state-of-the-art PIM DNN

    更新日期:2020-03-04
  • A Compiler Infrastructure for FPGA and ASIC Development
    arXiv.cs.AR Pub Date : 2020-02-29
    John Demme

    This whitepaper proposes a unified framework for hardware design tools to ease the development and inter-operability of said tools. By creating a large ecosystem of hardware development tools across vendors, academia, and the open source community, we hope to significantly increase much need productivity in hardware design.

    更新日期:2020-03-03
  • RapidLayout: Fast Hard Block Placement of FPGA-optimized Systolic Arrays using Evolutionary Algorithms
    arXiv.cs.AR Pub Date : 2020-02-17
    Niansong Zhang; Xiang Chen; Nachiket Kapre

    Evolutionary algorithms can outperform conventional simulated annealing placement on metrics such as runtime, wirelength, pipelining cost, and clock frequency when mapping FPGA hard block intensive designs such as systolic arrays on Xilinx UltraScale+ FPGAs. Such designs can take advantage of repeatable design organization of the arrays, the columnar arrangement of hard blocks such as DSPs and RAMs

    更新日期:2020-03-03
  • Using Name Confusion to Enhance Security
    arXiv.cs.AR Pub Date : 2019-11-05
    Mohamed Tarek Ibn Ziad; Miguel A. Arroyo; Evgeny Manzhosov; Vasileios P. Kemerlis; Simha Sethumadhavan

    Virtual memory is an abstraction that assigns references, or names, to data objects and instructions. Typically, instructions have exactly one name: a uniquely-identifiable virtual address. This mapping can be leveraged by adversaries to deterministically construct exploit payloads. In this work, we investigate how virtual memory should be redesigned to eliminate the need for this one-to-one mapping

    更新日期:2020-03-02
  • sBSNN: Stochastic-Bits Enabled Binary Spiking Neural Network with On-Chip Learning for Energy Efficient Neuromorphic Computing at the Edge
    arXiv.cs.AR Pub Date : 2020-02-25
    Minsuk Koo; Gopalakrishnan Srinivasan; Yong Shim; Kaushik Roy

    In this work, we propose stochastic Binary Spiking Neural Network (sBSNN) composed of stochastic spiking neurons and binary synapses (stochastic only during training) that computes probabilistically with one-bit precision for power-efficient and memory-compressed neuromorphic computing. We present an energy-efficient implementation of the proposed sBSNN using 'stochastic bit' as the core computational

    更新日期:2020-02-27
  • LORAX: Loss-Aware Approximations for Energy-Efficient Silicon Photonic Networks-on-Chip
    arXiv.cs.AR Pub Date : 2020-02-26
    Febin Sunny; Asif Mirza; Ishan Thakkar; Sudeep Pasricha; Nikdast Mahdi

    The approximate computing paradigm advocates for relaxing accuracy goals in applications to improve energy-efficiency and performance. Recently, this paradigm has been explored to improve the energy efficiency of silicon photonic networks-on-chip (PNoCs). In this paper, we propose a novel framework (LORAX) to enable more aggressive approximation during communication over silicon photonic links in PNoCs

    更新日期:2020-02-27
  • Optimality Study of Existing Quantum Computing Layout Synthesis Tools
    arXiv.cs.AR Pub Date : 2020-02-22
    Bochen Tan; Jason Cong

    Layout synthesis, an important step in quantum computing, processes quantum circuits to satisfy device layout constraints. In this paper, we construct QUEKO benchmarks for this problem, which have known optimal depth. We use QUEKO to evaluate the optimality of current layout synthesis tools, including Cirq from Google, Qiskit from IBM, $\mathsf{t}|\mathsf{ket}\rangle$ from Cambridge Quantum Computing

    更新日期:2020-02-25
  • Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
    arXiv.cs.AR Pub Date : 2020-02-24
    Florian Zaruba; Fabian Schuiki; Torsten Hoefler; Luca Benini

    Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are over-specialized and hard to adjust

    更新日期:2020-02-25
  • Addressing multiple bit/symbol errors in DRAM subsystem
    arXiv.cs.AR Pub Date : 2019-08-05
    Ravikiran Yeleswarapu; Arun K. Somani

    As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Multi-symbol errors arising due to faults in multiple data buses and chips may not be detected by these schemes. In this paper, we introduce Single Symbol

    更新日期:2020-02-25
  • SpArch: Efficient Architecture for Sparse Matrix Multiplication
    arXiv.cs.AR Pub Date : 2020-02-20
    Zhekai Zhang; Hanrui Wang; Song Han; William J. Dally

    Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various engineering and scientific applications. However, inner product based SpGENN introduces redundant input fetches for mismatched nonzero operands, while outer product based approach suffers from poor output locality due to numerous partial product matrices. Inefficiency in the reuse of either inputs or outputs data

    更新日期:2020-02-21
  • XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs
    arXiv.cs.AR Pub Date : 2019-08-19
    Cheng Li; Abdul Dakkak; Jinjun Xiong; Wei Wei; Lingjie Xu; Wen-mei Hwu

    There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost

    更新日期:2020-02-20
  • Information Theory as a Means of Determining the Main Factors Affecting the Processors Architecture
    arXiv.cs.AR Pub Date : 2020-02-17
    Anton Rakitskiy; Boris Ryabko

    In this article we are investigating the computers development process in the past decades in order to identify the factors that influence it the most. We describe such factors and use them to predict the direction of further development. To solve these problems, we use the concept of the Computer Capacity, which allows us to estimate the performance of computers theoretically, relying only on the

    更新日期:2020-02-19
  • Design of SEC-DED and SEC-DED-DAEC Codes of different lengths
    arXiv.cs.AR Pub Date : 2020-02-18
    Sayan Tripathi; Jhilam Jana; Jaydeb Bhaumik

    Reliability is an important requirement for both communication and storage systems. Due to continuous scale down of technology multiple adjacent bits error probability increases. The data may be corrupted due soft errors. Error correction codes are used to detect and correct the errors. In this paper, design of single error correction-double error detection (SEC-DED) and single error correction-double

    更新日期:2020-02-19
  • The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface
    arXiv.cs.AR Pub Date : 2019-10-15
    Hamid Reza Zohouri; Satoshi Matsuoka

    Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for different applications on FPGAs using HLS. However, a comprehensive analysis of the behavior and efficiency of the memory controller of FPGAs is missing in literature

    更新日期:2020-02-17
  • CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural Network Approach
    arXiv.cs.AR Pub Date : 2020-02-13
    Mohammad Saeed Abrishami; Massoud Pedram; Shahin Nazarian

    The miniaturization of transistors down to 5nm and beyond, plus the increasing complexity of integrated circuits, significantly aggravate short channel effects, and demand analysis and optimization of more design corners and modes. Simulators need to model output variables related to circuit timing, power, noise, etc., which exhibit nonlinear behavior. The existing simulation and sign-off tools, based

    更新日期:2020-02-14
  • Functional Failure Rate Due to Single-Event Transients in Clock Distribution Networks
    arXiv.cs.AR Pub Date : 2020-02-13
    Thomas Lange; Maximilien Glorieux; Dan Alexandrescu; Luca Sterpone

    With technology scaling, lower supply voltages, and higher operating frequencies clock distribution networks become more and more vulnerable to transients faults. These faults can cause circuit-wide effects and thus, significantly contribute to the functional failure rate of the circuit. This paper proposes a methodology to analyse how the functional behaviour is affected by Single-Event Transients

    更新日期:2020-02-14
  • AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference
    arXiv.cs.AR Pub Date : 2019-09-29
    Thierry Tambe; En-Yu Yang; Zishen Wan; Yuntian Deng; Vijay Janapa Reddi; Alexander Rush; David Brooks; Gu-Yeon Wei

    Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally

    更新日期:2020-02-12
  • RVCoreP : An optimized RISC-V soft processor of five-stage pipelining
    arXiv.cs.AR Pub Date : 2020-02-10
    Hiromu Miyazaki; Takuto Kanamori; Md Ashraful Islam; Kenji Kise

    RISC-V is a RISC based open and loyalty free instruction set architecture which has been developed since 2010, and can be used for cost-effective soft processors on FPGAs. The basic 32-bit integer instruction set in RISC-V is defined as RV32I, which is sufficient to support the operating system environment and suits for embedded systems. In this paper, we propose an optimized RV32I soft processor named

    更新日期:2020-02-11
  • A portable and Linux capable RISC-V computer system in Verilog HDL
    arXiv.cs.AR Pub Date : 2020-02-10
    Junya Miura; Hiromu Miyazaki; Kenji Kise

    RISC-V is an open and royalty free instruction set architecture which has been developed at the University of California, Berkeley. The processors using RISC-V can be designed and released freely. Because of this, various processor cores and system on chips (SoCs) have been released so far. However, there are a few public RISC-V computer systems that are portable and can boot Linux operating systems

    更新日期:2020-02-11
  • Isolation-Aware Timing Analysis and Design Space Exploration for Predictable and Composable Many-Core Systems
    arXiv.cs.AR Pub Date : 2019-05-31
    Behnaz Pourmohseni; Fedor Smirnov; Stefan Wildermann; Jürgen Teich

    Composable many-core systems enable the independent development and analysis of applications which will be executed on a shared platform where the mix of concurrently executed applications may change dynamically at run time. For each individual application, an off-line Design Space Exploration (DSE) is performed to compute several mapping alternatives on the platform, offering Pareto-optimal trade-offs

    更新日期:2020-02-11
  • Fast FPGA emulation of analog dynamics in digitally-driven systems
    arXiv.cs.AR Pub Date : 2020-02-06
    Steven Herbst; Byong Chan Lim; Mark Horowitz

    In this paper, we propose an architecture for FPGA emulation of mixed-signal systems that achieves high accuracy at a high throughput. We represent the analog output of a block as a superposition of step responses to changes in its analog input, and the output is evaluated only when needed by the digital subsystem. Our architecture is therefore intended for digitally-driven systems; that is, those

    更新日期:2020-02-07
  • Low Overhead Online Data Flow Tracking for Intermittently Powered Non-volatile FPGAs
    arXiv.cs.AR Pub Date : 2020-02-06
    Xinyi Zhang; Clay Patterson; Yongpan Liu; Chengmo Yang; Chun Jason Xue; Jingtong Hu

    Energy harvesting is an attractive way to power future IoT devices since it can eliminate the need for battery or power cables. However, harvested energy is intrinsically unstable. While FPGAs have been widely adopted in various embedded systems, it is hard to survive unstable power since all the memory components in FPGA are based on volatile SRAMs. The emerging non-volatile memory based FPGAs provide

    更新日期:2020-02-07
  • FPGA Acceleration of Sequence Alignment: A Survey
    arXiv.cs.AR Pub Date : 2020-02-05
    Sahand Salamat; Tajana Rosing

    Genomics is changing our understanding of humans, evolution, diseases, and medicines to name but a few. As sequencing technology is developed collecting DNA sequences takes less time thereby generating more genetic data every day. Today the rate of generating genetic data is outpacing the rate of computation power growth. Current sequencing machines can sequence 50 humans genome per day; however, aligning

    更新日期:2020-02-07
  • Efficient Fault Injection based on Dynamic HDL Slicing Technique
    arXiv.cs.AR Pub Date : 2020-01-24
    Ahmet Cagri Bagbaba; Maksim Jenihhin; Jaan Raik; Christian Sauer

    This work proposes a fault injection methodology where Hardware Description Language (HDL) code slicing is exploited to prune fault injection locations, thus enabling more efficient campaigns for safety mechanisms evaluation. In particular, the dynamic HDL slicing technique provides for a highly collapsed critical fault list and allows avoiding injections at redundant locations or time-steps. Experimental

    更新日期:2020-02-04
  • SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference
    arXiv.cs.AR Pub Date : 2019-09-02
    Ye Yu; Niraj K. Jha

    CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory

    更新日期:2020-02-04
  • Automatic Generation of Application-Specific FPGA Overlays with RapidWright
    arXiv.cs.AR Pub Date : 2020-01-31
    Joel Mandebi Mbongue; Danielle Tchuinkou Kwadjo; Christophe Bobda

    Overlay architectures implemented on FPGA devices have been proposed as a means to increase FPGA adoption in general-purpose computing. They provide the benefits of software such as flexibility and programmability, thus making it easier to build dedicated compilers. However, existing overlays are generic, resource and power hungry with performance usually an order of magnitude lower than bare metal

    更新日期:2020-02-03
  • HEAX: An Architecture for Computing on Encrypted Data
    arXiv.cs.AR Pub Date : 2019-09-20
    M. Sadegh Riazi; Kim Laine; Blake Pelton; Wei Dai

    With the rapid increase in cloud computing, concerns surrounding data privacy, security, and confidentiality also have been increased significantly. Not only cloud providers are susceptible to internal and external hacks, but also in some scenarios, data owners cannot outsource the computation due to privacy laws such as GDPR, HIPAA, or CCPA. Fully Homomorphic Encryption (FHE) is a groundbreaking invention

    更新日期:2020-01-27
  • Regional Clock Tree Generation by Abutment in Synchoros VLSI Design
    arXiv.cs.AR Pub Date : 2019-10-24
    Dimitrios Stathis; Panagiotis Chaourani; Syed M. A. H. Jafri; Ahmed Hemani

    Synchoros VLSI design style has been proposed as an alternative to standard cell-based design. Standard cells are replaced by synchoros large grain VLSI design objects called SiLago blocks. This new design style enables end-to-end automation of large scale designs by abutting the SiLago blocks to eliminate logic and physical synthesis for the end-users. A key problem in this automation process is the

    更新日期:2020-01-23
  • SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
    arXiv.cs.AR Pub Date : 2020-01-20
    Javier Picorel; Seyed Alireza Sanaee Kohroudi; Zi Yan; Abhishek Bhattacharjee; Babak Falsafi; Djordje Jevdjic

    Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to

    更新日期:2020-01-22
  • A Scalable Decoder Micro-architecture for Fault-Tolerant Quantum Computing
    arXiv.cs.AR Pub Date : 2020-01-18
    Poulami Das; Christopher A. Pattison; Srilatha Manne; Douglas Carmean; Krysta Svore; Moinuddin Qureshi; Nicolas Delfosse

    Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors faster

    更新日期:2020-01-22
  • Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX
    arXiv.cs.AR Pub Date : 2020-01-21
    Youren Shen; Hongliang Tian; Yu Chen; Kang Chen; Runji Wang; Yi Xu; Yubin Xia

    Intel Software Guard Extensions (SGX) enables user-level code to create private memory regions called enclaves, whose code and data are protected by the CPU from software and hardware attacks outside the enclaves. Recent work introduces library operating systems (LibOSes) to SGX so that legacy applications can run inside enclaves with few or even no modifications. As virtually any non-trivial application

    更新日期:2020-01-22
  • Eva-CiM: A System-Level Performance and Energy Evaluation Framework for Computing-in-Memory Architectures
    arXiv.cs.AR Pub Date : 2019-01-27
    Di Gao; Dayane Reis; Xiaobo Sharon Hu; Cheng Zhuo

    Computing-in-Memory (CiM) architectures aim to reduce costly data transfers by performing arithmetic and logic operations in memory and hence relieve the pressure due to the memory wall. However, determining whether a given workload can really benefit from CiM, which memory hierarchy and what device technology should be adopted by a CiM architecture requires in-depth study that is not only time consuming

    更新日期:2020-01-16
  • CHIPKIT: An agile, reusable open-source framework for rapid test chip development
    arXiv.cs.AR Pub Date : 2020-01-13
    Paul Whatmough; Marco Donato; Glenn Ko; David Brooks; Gu-Yeon Wei

    The current trend for domain-specific architectures (DSAs) has led to renewed interest in research test chips to demonstrate new specialized hardware. Tape-outs also offer huge pedagogical value garnered from real hands-on exposure to the whole system stack. However, successful tape-outs demand hard-earned experience, and the design process is time consuming and fraught with challenges. Therefore,

    更新日期:2020-01-15
  • Processing Distribution and Architecture Tradeoff for Large Intelligent Surface Implementation
    arXiv.cs.AR Pub Date : 2020-01-14
    Jesus Rodriguez Sanchez; Ove Edfors; Fredrik Rusek; Liang Liu

    The Large Intelligent Surface (LIS) concept has emerged recently as a new paradigm for wireless communication, remote sensing and positioning. Despite of its potential, there are a lot of challenges from an implementation point of view, with the interconnection data-rate and computational complexity being the most relevant. Distributed processing techniques and hierarchical architectures are expected

    更新日期:2020-01-15
  • Hardware Implementation of Neural Self-Interference Cancellation
    arXiv.cs.AR Pub Date : 2020-01-13
    Yann Kurzo; Andreas Toftegaard Kristensen; Andreas Burg; Alexios Balatsoukas-Stimming

    In-band full-duplex systems can transmit and receive information simultaneously on the same frequency band. However, due to the strong self-interference caused by the transmitter to its own receiver, the use of non-linear digital self-interference cancellation is essential. In this work, we describe a hardware architecture for a neural network-based non-linear self-interference (SI) canceller and we

    更新日期:2020-01-15
  • Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation
    arXiv.cs.AR Pub Date : 2020-01-14
    Chuteng Zhou; Prad Kadambi; Matthew Mattina; Paul N. Whatmough

    The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics

    更新日期:2020-01-15
  • Ratatoskr: An open-source framework for in-depth power, performance and area analysis in 3D NoCs
    arXiv.cs.AR Pub Date : 2019-12-11
    Jan Moritz Joseph; Lennart Bamberg; Imad Hajjar; Anna Drewes; Behnam Razi Perjikolaei; Alberto García-Ortiz; Thilo Pionteck

    We introduce ratatoskr, an open-source framework for in-depth power, performance and area (PPA) analysis in NoCs for 3D-integrated and heterogeneous System-on-Chips (SoCs). It covers all layers of abstraction by providing a NoC hardware implementation on RT level, a NoC simulator on cycle-accurate level and an application model on transaction level. By this comprehensive approach, ratatoskr can provide

    更新日期:2020-01-15
  • SERAD: Soft Error Resilient Asynchronous Design using a Bundled Data Protocol
    arXiv.cs.AR Pub Date : 2020-01-13
    Sai Aparna Aketi; Smriti Gupta; Huimei Cheng; Joycee Mekie; Peter A. Beerel

    The risk of soft errors due to radiation continues to be a significant challenge for engineers trying to build systems that can handle harsh environments. Building systems that are Radiation Hardened by Design (RHBD) is the preferred approach, but existing techniques are expensive in terms of performance, power, and/or area. This paper introduces a novel soft-error resilient asynchronous bundled-data

    更新日期:2020-01-14
  • Optimizing the Write Fidelity of MRAMs
    arXiv.cs.AR Pub Date : 2020-01-11
    Yongjune Kim; Yoocharn Jeon; Cyril Guyot; Yuval Cassuto

    Magnetic random-access memory (MRAM) is a promising memory technology due to its high density, non-volatility, and high endurance. However, achieving high memory fidelity incurs significant write-energy costs, which should be reduced for large-scale deployment of MRAMs. In this paper, we formulate an optimization problem for maximizing the memory fidelity given energy constraints, and propose a biconvex

    更新日期:2020-01-14
  • Boosting the Bounds of Symbolic QED for Effective Pre-Silicon Verification of Processor Cores
    arXiv.cs.AR Pub Date : 2019-08-19
    Karthik Ganesan; Srinivasa Shashank Nuthakki

    Existing techniques to ensure functional correctness and hardware trust during pre-silicon verification face severe limitations. In this work, we systematically leverage two key ideas: 1) Symbolic Quick Error Detection (Symbolic QED or SQED), a recent bug detection and localization technique using Bounded Model Checking (BMC); and 2) Symbolic starting states, to present a method that: i) Effectively

    更新日期:2020-01-09
  • Low-cost Stochastic Number Generators for Stochastic Computing
    arXiv.cs.AR Pub Date : 2020-01-03
    Sayed Ahmad Salehi

    Stochastic unary computing provides low-area circuits. However, the required area consuming stochastic number generators (SNGs) in these circuits can diminish their overall gain in area, particularly if several SNGs are required. We propose area-efficient SNGs by sharing the permuted output of one linear feedback shift register (LFSR) among several SNGs. With no hardware overhead, the proposed architecture

    更新日期:2020-01-08
  • Stochastic Rounding: Algorithms and Hardware Accelerator
    arXiv.cs.AR Pub Date : 2020-01-06
    Mantas Mikaitis

    General algorithms and a hardware accelerator for performing stochastic rounding (SR) are presented. The main goal is to augment the ARM M4F based multi-core processor SpiNNaker 2 with a more flexible rounding functionality than is available in the ARM processor itself. The motivation of adding such an accelerator in hardware is based on our previous results showing improvements in numerical accuracy

    更新日期:2020-01-07
  • Fast Modeling L2 Cache Reuse Distance Histograms Using Combined Locality Information from Software Traces
    arXiv.cs.AR Pub Date : 2019-07-11
    Ming Ling; Jiancong Ge; Guangmin Wang

    To mitigate the performance gap between CPU and the main memory, multi-level cache architectures are widely used in modern processors. Therefore, modeling the behaviors of the downstream caches becomes a critical part of the processor performance evaluation in the early stage of Design Space Exploration (DSE). In this paper, we propose a fast and accurate L2 cache reuse distance histogram model, which

    更新日期:2020-01-07
  • TrappeD: DRAM Trojan Designs for Information Leakage and Fault Injection Attacks
    arXiv.cs.AR Pub Date : 2020-01-03
    Karthikeyan Nagarajan; Asmit De; Mohammad Nasim Imtiaz Khan; Swaroop Ghosh

    In this paper, we investigate the advanced circuit features such as wordline- (WL) underdrive (prevents retention failure) and overdrive (assists write) employed in the peripherals of Dynamic RAM (DRAM) memories from a security perspective. In an ideal environment, these features ensure fast and reliable read and write operations. However, an adversary can re-purpose them by inserting Trojans to deliver

    更新日期:2020-01-06
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
全球疫情及响应:BMC Medicine专题征稿
欢迎探索2019年最具下载量的化学论文
新版X-MOL期刊搜索和高级搜索功能介绍
化学材料学全球高引用
ACS材料视界
南方科技大学
x-mol收录
南方科技大学
自然科研论文编辑服务
上海交通大学彭文杰
中国科学院长春应化所于聪-4-8
武汉工程大学
课题组网站
X-MOL
深圳大学二维材料实验室张晗
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug