• arXiv.cs.AR Pub Date : 2020-03-25
Andreas Grübl; Sebastian Billaudelle; Benjamin Cramer; Vitali Karasenko; Johannes Schemmel

This paper presents verification and implementation methods that have been developed for the design of the BrainScaleS-2 65nm ASICs. The 2nd generation BrainScaleS chips are mixed-signal devices with tight coupling between full-custom analog neuromorphic circuits and two general purpose microprocessors (PPU) with SIMD extension for on-chip learning and plasticity. Simulation methods for automated analysis

更新日期：2020-03-26
• arXiv.cs.AR Pub Date : 2020-03-21
Khanh N. Dang; Yuichi Okuyama; Abderazek Ben Abdallah

Network-on-Chip (NoC) paradigm has been proposed as an auspicious solution to handle the strict communication requirements between the increasingly large number of cores on a single multi and many-core chips. However, NoC systems are exposed to a variety of manufacturing, design and energetic particles factors making them vulnerable to permanent (hard) faults and transient (soft) errors. In this paper

更新日期：2020-03-24
• arXiv.cs.AR Pub Date : 2020-03-23
Sumon Kumar Bose; Vivek Mohan; Arindam Basu

This paper presents an in-memory computing (IMC) architecture for image denoising. The proposed SRAM based in-memory processing framework works in tandem with approximate computing on a binary image generated from neuromorphic vision sensors. Implemented in TSMC 65nm process, the proposed architecture enables approximately 2000X energy savings (approximately 222X from IMC) compared to a digital implementation

更新日期：2020-03-24
• arXiv.cs.AR Pub Date : 2020-03-21
Khanh N Dang; Michael Meyer; Yuichi Okuyama; Abderazek Ben Abdallah

Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an auspicious solution, merging the high parallelism of the Network-on-Chip (NoC) paradigm with the high-performance and low-power cost of 3D-ICs. However, as technology scales down, the reliability issues are becoming more crucial, especially for complex 3D-NoC which provides the communication requirements of multi and many-core systems-on-chip

更新日期：2020-03-24
• arXiv.cs.AR Pub Date : 2020-03-19
Khanh N. Dang; Akram Ben Ahmed; Abderazek Ben Abdallah; Xuan-Tu Tran

By combining Three Dimensional Integrated Circuits with the Network-on-Chip infrastructure to obtain 3D Networks-on-Chip (3D-NoCs), the new on-chip communication paradigm brings several advantages on lower power, smaller footprint and lower latency. However, thermal dissipation is one of the most critical challenges for 3D-ICs where the heat cannot easily transfer through several layers of silicon

更新日期：2020-03-20
• arXiv.cs.AR Pub Date : 2020-03-16
Archisman Ghosh; Debayan Das; Shreyas Sen

Mathematically-secure cryptographic algorithms leak significant side channel information through their power supplies when implemented on a physical platform. These side channel leakages can be exploited by an attacker to extract the secret key of an embedded device. The existing state-of-the-art countermeasures mainly focus on the power balancing, gate-level masking, or signal-to-noise (SNR) reduction

更新日期：2020-03-18
• arXiv.cs.AR Pub Date : 2020-03-12
Pai-Yu Tan; Po-Yao Chuang; Yen-Ting Lin; Cheng-Wen Wu; Juin-Ming Lu

Neural network hardware is considered an essential part of future edge devices. In this paper, we propose a binary-weight spiking neural network (BW-SNN) hardware architecture for low-power real-time object classification on edge platforms. This design stores a full neural network on-chip, and hence requires no off-chip bandwidth. The proposed systolic array maximizes data reuse for a typical convolutional

更新日期：2020-03-16
• arXiv.cs.AR Pub Date : 2020-03-11
Riaz-ul-haque Mian; Michihiro Shintani; Michiko Inoue

Software-hardware co-design solutions for decimal computation can provide several Pareto points to development of embedded systems in terms of hardware cost and performance. This paper demonstrates how to accurately evaluate such co-design solutions using RISC-V ecosystem. In a software-hardware co-design solution, a part of solution requires dedicated hardware. In our evaluation framework, we develop

更新日期：2020-03-12
• arXiv.cs.AR Pub Date : 2019-10-20
Mohammed Alser; Taha Shahroodi; Juan Gomez-Luna; Can Alkan; Onur Mutlu

Motivation: We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for the computationally costly sequence alignment step. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in only finding the optimal path

更新日期：2020-03-12
• arXiv.cs.AR Pub Date : 2020-03-06
SeyedRamin Rasoulinezhad; Siddhartha; Hao Zhou; Lingli Wang; David Boland; Philip H. W. Leong

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and

更新日期：2020-03-09
• arXiv.cs.AR Pub Date : 2020-03-02
Hongjie Wang; Yang Zhao; Chaojian Li; Yue Wang; Yingyan Lin

The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns. To reduce the dominant data movement cost of training, process in-memory (PIM) has emerged as a promising solution as it alleviates the need to access DNN weights. However, state-of-the-art PIM DNN

更新日期：2020-03-04
• arXiv.cs.AR Pub Date : 2020-02-29
John Demme

This whitepaper proposes a unified framework for hardware design tools to ease the development and inter-operability of said tools. By creating a large ecosystem of hardware development tools across vendors, academia, and the open source community, we hope to significantly increase much need productivity in hardware design.

更新日期：2020-03-03
• arXiv.cs.AR Pub Date : 2020-02-17
Niansong Zhang; Xiang Chen; Nachiket Kapre

Evolutionary algorithms can outperform conventional simulated annealing placement on metrics such as runtime, wirelength, pipelining cost, and clock frequency when mapping FPGA hard block intensive designs such as systolic arrays on Xilinx UltraScale+ FPGAs. Such designs can take advantage of repeatable design organization of the arrays, the columnar arrangement of hard blocks such as DSPs and RAMs

更新日期：2020-03-03
• arXiv.cs.AR Pub Date : 2019-11-05
Mohamed Tarek Ibn Ziad; Miguel A. Arroyo; Evgeny Manzhosov; Vasileios P. Kemerlis; Simha Sethumadhavan

Virtual memory is an abstraction that assigns references, or names, to data objects and instructions. Typically, instructions have exactly one name: a uniquely-identifiable virtual address. This mapping can be leveraged by adversaries to deterministically construct exploit payloads. In this work, we investigate how virtual memory should be redesigned to eliminate the need for this one-to-one mapping

更新日期：2020-03-02
• arXiv.cs.AR Pub Date : 2020-02-25
Minsuk Koo; Gopalakrishnan Srinivasan; Yong Shim; Kaushik Roy

In this work, we propose stochastic Binary Spiking Neural Network (sBSNN) composed of stochastic spiking neurons and binary synapses (stochastic only during training) that computes probabilistically with one-bit precision for power-efficient and memory-compressed neuromorphic computing. We present an energy-efficient implementation of the proposed sBSNN using 'stochastic bit' as the core computational

更新日期：2020-02-27
• arXiv.cs.AR Pub Date : 2020-02-26
Febin Sunny; Asif Mirza; Ishan Thakkar; Sudeep Pasricha; Nikdast Mahdi

The approximate computing paradigm advocates for relaxing accuracy goals in applications to improve energy-efficiency and performance. Recently, this paradigm has been explored to improve the energy efficiency of silicon photonic networks-on-chip (PNoCs). In this paper, we propose a novel framework (LORAX) to enable more aggressive approximation during communication over silicon photonic links in PNoCs

更新日期：2020-02-27
• arXiv.cs.AR Pub Date : 2020-02-22
Bochen Tan; Jason Cong

Layout synthesis, an important step in quantum computing, processes quantum circuits to satisfy device layout constraints. In this paper, we construct QUEKO benchmarks for this problem, which have known optimal depth. We use QUEKO to evaluate the optimality of current layout synthesis tools, including Cirq from Google, Qiskit from IBM, $\mathsf{t}|\mathsf{ket}\rangle$ from Cambridge Quantum Computing

更新日期：2020-02-25
• arXiv.cs.AR Pub Date : 2020-02-24
Florian Zaruba; Fabian Schuiki; Torsten Hoefler; Luca Benini

Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are over-specialized and hard to adjust

更新日期：2020-02-25
• arXiv.cs.AR Pub Date : 2019-08-05
Ravikiran Yeleswarapu; Arun K. Somani

As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Multi-symbol errors arising due to faults in multiple data buses and chips may not be detected by these schemes. In this paper, we introduce Single Symbol

更新日期：2020-02-25
• arXiv.cs.AR Pub Date : 2020-02-20
Zhekai Zhang; Hanrui Wang; Song Han; William J. Dally

Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various engineering and scientific applications. However, inner product based SpGENN introduces redundant input fetches for mismatched nonzero operands, while outer product based approach suffers from poor output locality due to numerous partial product matrices. Inefficiency in the reuse of either inputs or outputs data

更新日期：2020-02-21
• arXiv.cs.AR Pub Date : 2019-08-19
Cheng Li; Abdul Dakkak; Jinjun Xiong; Wei Wei; Lingjie Xu; Wen-mei Hwu

There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost

更新日期：2020-02-20
• arXiv.cs.AR Pub Date : 2020-02-17
Anton Rakitskiy; Boris Ryabko

In this article we are investigating the computers development process in the past decades in order to identify the factors that influence it the most. We describe such factors and use them to predict the direction of further development. To solve these problems, we use the concept of the Computer Capacity, which allows us to estimate the performance of computers theoretically, relying only on the

更新日期：2020-02-19
• arXiv.cs.AR Pub Date : 2020-02-18
Sayan Tripathi; Jhilam Jana; Jaydeb Bhaumik

Reliability is an important requirement for both communication and storage systems. Due to continuous scale down of technology multiple adjacent bits error probability increases. The data may be corrupted due soft errors. Error correction codes are used to detect and correct the errors. In this paper, design of single error correction-double error detection (SEC-DED) and single error correction-double

更新日期：2020-02-19
• arXiv.cs.AR Pub Date : 2019-10-15
Hamid Reza Zohouri; Satoshi Matsuoka

Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for different applications on FPGAs using HLS. However, a comprehensive analysis of the behavior and efficiency of the memory controller of FPGAs is missing in literature

更新日期：2020-02-17
• arXiv.cs.AR Pub Date : 2020-02-13
Mohammad Saeed Abrishami; Massoud Pedram; Shahin Nazarian

The miniaturization of transistors down to 5nm and beyond, plus the increasing complexity of integrated circuits, significantly aggravate short channel effects, and demand analysis and optimization of more design corners and modes. Simulators need to model output variables related to circuit timing, power, noise, etc., which exhibit nonlinear behavior. The existing simulation and sign-off tools, based

更新日期：2020-02-14
• arXiv.cs.AR Pub Date : 2020-02-13
Thomas Lange; Maximilien Glorieux; Dan Alexandrescu; Luca Sterpone

With technology scaling, lower supply voltages, and higher operating frequencies clock distribution networks become more and more vulnerable to transients faults. These faults can cause circuit-wide effects and thus, significantly contribute to the functional failure rate of the circuit. This paper proposes a methodology to analyse how the functional behaviour is affected by Single-Event Transients

更新日期：2020-02-14
• arXiv.cs.AR Pub Date : 2019-09-29
Thierry Tambe; En-Yu Yang; Zishen Wan; Yuntian Deng; Vijay Janapa Reddi; Alexander Rush; David Brooks; Gu-Yeon Wei

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally

更新日期：2020-02-12
• arXiv.cs.AR Pub Date : 2020-02-10
Hiromu Miyazaki; Takuto Kanamori; Md Ashraful Islam; Kenji Kise

RISC-V is a RISC based open and loyalty free instruction set architecture which has been developed since 2010, and can be used for cost-effective soft processors on FPGAs. The basic 32-bit integer instruction set in RISC-V is defined as RV32I, which is sufficient to support the operating system environment and suits for embedded systems. In this paper, we propose an optimized RV32I soft processor named

更新日期：2020-02-11
• arXiv.cs.AR Pub Date : 2020-02-10
Junya Miura; Hiromu Miyazaki; Kenji Kise

RISC-V is an open and royalty free instruction set architecture which has been developed at the University of California, Berkeley. The processors using RISC-V can be designed and released freely. Because of this, various processor cores and system on chips (SoCs) have been released so far. However, there are a few public RISC-V computer systems that are portable and can boot Linux operating systems

更新日期：2020-02-11
• arXiv.cs.AR Pub Date : 2019-05-31
Behnaz Pourmohseni; Fedor Smirnov; Stefan Wildermann; Jürgen Teich

Composable many-core systems enable the independent development and analysis of applications which will be executed on a shared platform where the mix of concurrently executed applications may change dynamically at run time. For each individual application, an off-line Design Space Exploration (DSE) is performed to compute several mapping alternatives on the platform, offering Pareto-optimal trade-offs

更新日期：2020-02-11
• arXiv.cs.AR Pub Date : 2020-02-06
Steven Herbst; Byong Chan Lim; Mark Horowitz

In this paper, we propose an architecture for FPGA emulation of mixed-signal systems that achieves high accuracy at a high throughput. We represent the analog output of a block as a superposition of step responses to changes in its analog input, and the output is evaluated only when needed by the digital subsystem. Our architecture is therefore intended for digitally-driven systems; that is, those

更新日期：2020-02-07
• arXiv.cs.AR Pub Date : 2020-02-06
Xinyi Zhang; Clay Patterson; Yongpan Liu; Chengmo Yang; Chun Jason Xue; Jingtong Hu

Energy harvesting is an attractive way to power future IoT devices since it can eliminate the need for battery or power cables. However, harvested energy is intrinsically unstable. While FPGAs have been widely adopted in various embedded systems, it is hard to survive unstable power since all the memory components in FPGA are based on volatile SRAMs. The emerging non-volatile memory based FPGAs provide

更新日期：2020-02-07
• arXiv.cs.AR Pub Date : 2020-02-05
Sahand Salamat; Tajana Rosing

Genomics is changing our understanding of humans, evolution, diseases, and medicines to name but a few. As sequencing technology is developed collecting DNA sequences takes less time thereby generating more genetic data every day. Today the rate of generating genetic data is outpacing the rate of computation power growth. Current sequencing machines can sequence 50 humans genome per day; however, aligning

更新日期：2020-02-07
• arXiv.cs.AR Pub Date : 2020-01-24
Ahmet Cagri Bagbaba; Maksim Jenihhin; Jaan Raik; Christian Sauer

This work proposes a fault injection methodology where Hardware Description Language (HDL) code slicing is exploited to prune fault injection locations, thus enabling more efficient campaigns for safety mechanisms evaluation. In particular, the dynamic HDL slicing technique provides for a highly collapsed critical fault list and allows avoiding injections at redundant locations or time-steps. Experimental

更新日期：2020-02-04
• arXiv.cs.AR Pub Date : 2019-09-02
Ye Yu; Niraj K. Jha

CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory

更新日期：2020-02-04
• arXiv.cs.AR Pub Date : 2020-01-31
Joel Mandebi Mbongue; Danielle Tchuinkou Kwadjo; Christophe Bobda

Overlay architectures implemented on FPGA devices have been proposed as a means to increase FPGA adoption in general-purpose computing. They provide the benefits of software such as flexibility and programmability, thus making it easier to build dedicated compilers. However, existing overlays are generic, resource and power hungry with performance usually an order of magnitude lower than bare metal

更新日期：2020-02-03
• arXiv.cs.AR Pub Date : 2019-09-20
M. Sadegh Riazi; Kim Laine; Blake Pelton; Wei Dai

With the rapid increase in cloud computing, concerns surrounding data privacy, security, and confidentiality also have been increased significantly. Not only cloud providers are susceptible to internal and external hacks, but also in some scenarios, data owners cannot outsource the computation due to privacy laws such as GDPR, HIPAA, or CCPA. Fully Homomorphic Encryption (FHE) is a groundbreaking invention

更新日期：2020-01-27
• arXiv.cs.AR Pub Date : 2019-10-24
Dimitrios Stathis; Panagiotis Chaourani; Syed M. A. H. Jafri; Ahmed Hemani

Synchoros VLSI design style has been proposed as an alternative to standard cell-based design. Standard cells are replaced by synchoros large grain VLSI design objects called SiLago blocks. This new design style enables end-to-end automation of large scale designs by abutting the SiLago blocks to eliminate logic and physical synthesis for the end-users. A key problem in this automation process is the

更新日期：2020-01-23
• arXiv.cs.AR Pub Date : 2020-01-20
Javier Picorel; Seyed Alireza Sanaee Kohroudi; Zi Yan; Abhishek Bhattacharjee; Babak Falsafi; Djordje Jevdjic

Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to

更新日期：2020-01-22
• arXiv.cs.AR Pub Date : 2020-01-18
Poulami Das; Christopher A. Pattison; Srilatha Manne; Douglas Carmean; Krysta Svore; Moinuddin Qureshi; Nicolas Delfosse

Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors faster

更新日期：2020-01-22
• arXiv.cs.AR Pub Date : 2020-01-21
Youren Shen; Hongliang Tian; Yu Chen; Kang Chen; Runji Wang; Yi Xu; Yubin Xia

Intel Software Guard Extensions (SGX) enables user-level code to create private memory regions called enclaves, whose code and data are protected by the CPU from software and hardware attacks outside the enclaves. Recent work introduces library operating systems (LibOSes) to SGX so that legacy applications can run inside enclaves with few or even no modifications. As virtually any non-trivial application

更新日期：2020-01-22
• arXiv.cs.AR Pub Date : 2019-01-27
Di Gao; Dayane Reis; Xiaobo Sharon Hu; Cheng Zhuo

Computing-in-Memory (CiM) architectures aim to reduce costly data transfers by performing arithmetic and logic operations in memory and hence relieve the pressure due to the memory wall. However, determining whether a given workload can really benefit from CiM, which memory hierarchy and what device technology should be adopted by a CiM architecture requires in-depth study that is not only time consuming

更新日期：2020-01-16
• arXiv.cs.AR Pub Date : 2020-01-13
Paul Whatmough; Marco Donato; Glenn Ko; David Brooks; Gu-Yeon Wei

The current trend for domain-specific architectures (DSAs) has led to renewed interest in research test chips to demonstrate new specialized hardware. Tape-outs also offer huge pedagogical value garnered from real hands-on exposure to the whole system stack. However, successful tape-outs demand hard-earned experience, and the design process is time consuming and fraught with challenges. Therefore,

更新日期：2020-01-15
• arXiv.cs.AR Pub Date : 2020-01-14
Jesus Rodriguez Sanchez; Ove Edfors; Fredrik Rusek; Liang Liu

The Large Intelligent Surface (LIS) concept has emerged recently as a new paradigm for wireless communication, remote sensing and positioning. Despite of its potential, there are a lot of challenges from an implementation point of view, with the interconnection data-rate and computational complexity being the most relevant. Distributed processing techniques and hierarchical architectures are expected

更新日期：2020-01-15
• arXiv.cs.AR Pub Date : 2020-01-13
Yann Kurzo; Andreas Toftegaard Kristensen; Andreas Burg; Alexios Balatsoukas-Stimming

In-band full-duplex systems can transmit and receive information simultaneously on the same frequency band. However, due to the strong self-interference caused by the transmitter to its own receiver, the use of non-linear digital self-interference cancellation is essential. In this work, we describe a hardware architecture for a neural network-based non-linear self-interference (SI) canceller and we

更新日期：2020-01-15
• arXiv.cs.AR Pub Date : 2020-01-14

The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics

更新日期：2020-01-15
• arXiv.cs.AR Pub Date : 2019-12-11
Jan Moritz Joseph; Lennart Bamberg; Imad Hajjar; Anna Drewes; Behnam Razi Perjikolaei; Alberto García-Ortiz; Thilo Pionteck

We introduce ratatoskr, an open-source framework for in-depth power, performance and area (PPA) analysis in NoCs for 3D-integrated and heterogeneous System-on-Chips (SoCs). It covers all layers of abstraction by providing a NoC hardware implementation on RT level, a NoC simulator on cycle-accurate level and an application model on transaction level. By this comprehensive approach, ratatoskr can provide

更新日期：2020-01-15
• arXiv.cs.AR Pub Date : 2020-01-13
Sai Aparna Aketi; Smriti Gupta; Huimei Cheng; Joycee Mekie; Peter A. Beerel

The risk of soft errors due to radiation continues to be a significant challenge for engineers trying to build systems that can handle harsh environments. Building systems that are Radiation Hardened by Design (RHBD) is the preferred approach, but existing techniques are expensive in terms of performance, power, and/or area. This paper introduces a novel soft-error resilient asynchronous bundled-data

更新日期：2020-01-14
• arXiv.cs.AR Pub Date : 2020-01-11
Yongjune Kim; Yoocharn Jeon; Cyril Guyot; Yuval Cassuto

Magnetic random-access memory (MRAM) is a promising memory technology due to its high density, non-volatility, and high endurance. However, achieving high memory fidelity incurs significant write-energy costs, which should be reduced for large-scale deployment of MRAMs. In this paper, we formulate an optimization problem for maximizing the memory fidelity given energy constraints, and propose a biconvex

更新日期：2020-01-14
• arXiv.cs.AR Pub Date : 2019-08-19
Karthik Ganesan; Srinivasa Shashank Nuthakki

Existing techniques to ensure functional correctness and hardware trust during pre-silicon verification face severe limitations. In this work, we systematically leverage two key ideas: 1) Symbolic Quick Error Detection (Symbolic QED or SQED), a recent bug detection and localization technique using Bounded Model Checking (BMC); and 2) Symbolic starting states, to present a method that: i) Effectively

更新日期：2020-01-09
• arXiv.cs.AR Pub Date : 2020-01-03

Stochastic unary computing provides low-area circuits. However, the required area consuming stochastic number generators (SNGs) in these circuits can diminish their overall gain in area, particularly if several SNGs are required. We propose area-efficient SNGs by sharing the permuted output of one linear feedback shift register (LFSR) among several SNGs. With no hardware overhead, the proposed architecture

更新日期：2020-01-08
• arXiv.cs.AR Pub Date : 2020-01-06
Mantas Mikaitis

General algorithms and a hardware accelerator for performing stochastic rounding (SR) are presented. The main goal is to augment the ARM M4F based multi-core processor SpiNNaker 2 with a more flexible rounding functionality than is available in the ARM processor itself. The motivation of adding such an accelerator in hardware is based on our previous results showing improvements in numerical accuracy

更新日期：2020-01-07
• arXiv.cs.AR Pub Date : 2019-07-11
Ming Ling; Jiancong Ge; Guangmin Wang

To mitigate the performance gap between CPU and the main memory, multi-level cache architectures are widely used in modern processors. Therefore, modeling the behaviors of the downstream caches becomes a critical part of the processor performance evaluation in the early stage of Design Space Exploration (DSE). In this paper, we propose a fast and accurate L2 cache reuse distance histogram model, which

更新日期：2020-01-07
• arXiv.cs.AR Pub Date : 2020-01-03
Karthikeyan Nagarajan; Asmit De; Mohammad Nasim Imtiaz Khan; Swaroop Ghosh

In this paper, we investigate the advanced circuit features such as wordline- (WL) underdrive (prevents retention failure) and overdrive (assists write) employed in the peripherals of Dynamic RAM (DRAM) memories from a security perspective. In an ideal environment, these features ensure fast and reliable read and write operations. However, an adversary can re-purpose them by inserting Trojans to deliver

更新日期：2020-01-06
Contents have been reproduced by permission of the publishers.

down
wechat
bug