当前期刊: IEEE Transactions on Computers Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • Distributed Training of Support Vector Machine on a Multiple-FPGA System
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-05-11
    Jyotikrishna Dass; Yashwardhan Narawane; Rabi N. Mahapatra; Vivek Sarin

    Support Vector Machine (SVM) is a supervised machine learning model for classification tasks. Training SVM on a large number of data samples is challenging due to the high computational cost and memory requirement. Hence, model training is supported on a high-performance server which typically runs a sequential training algorithm on centralized data. However, as we move towards massive workloads, it

  • Adaptive Model-Based Scheduling in Software Transactional Memory
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-19
    Pierangelo Di Sanzo; Alessandro Pellegrini; Marco Sannicandro; Bruno Ciciani; Francesco Quaglia

    Software Transactional Memory (STM) stands as powerful concurrent programming paradigm, enabling atomicity, and isolation while accessing shared data. On the downside, STM may suffer from performance degradation due to excessive conflicts among concurrent transactions, which cause waste of CPU-cycles and energy because of transaction aborts. An approach to cope with this issue consists of putting in

  • Branch Prediction Attack on Blinded Scalar Multiplication
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-09
    Sarani Bhattacharya; Clémentine Maurice; Shivam Bhasin; Debdeep Mukhopadhyay

    In recent years, performance counters have been used as a side channel source to monitor branch mispredictions, in order to attack cryptographic algorithms. However, the literature considers blinding techniques as effective countermeasures against such attacks. In this article, we present the first template attack on the branch predictor. We target blinded scalar multiplications with a side-channel

  • A Modeling Framework for Reliability of Erasure Codes in SSD Arrays
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-27
    Mostafa Kishani; Saba Ahmadian; Hossein Asadi

    Emergence of Solid-State Drives (SSDs) have evolved the data storage industry where they are rapidly replacing Hard Disk Drives (HDDs) due to their superiority in performance and power. Meanwhile, SSDs have reliability issues due to bit errors, bad blocks, and bad chips. To help reliability, Redundant Array of Independent Disks (RAID) configurations, originally proposed to increase both performance

  • CryptSQLite: SQLite With High Data Security
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-31
    Yongzhi Wang; Yulong Shen; Cuicui Su; Jiawen Ma; Lingtong Liu; Xuewen Dong

    SQLite, one of the most popular light-weighted database system, has been widely used in various systems. However, the compact design of SQLite did not make enough consideration on user data security. Specifically, anyone who has obtained the access to the database file will be able to read or tamper the data. Existing encryption-based solutions can only protect data on storage, while still exposing

  • Incremental Throughput Allocation of Heterogeneous Storage With No Disruptions in Dynamic Setting
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-31
    ZhiSheng Huo; Limin Xiao; Minyi Guo; Xiaoling Rong

    Solid-state drives (SSDs) have been added into storage systems for improving their performance, which will bring the heterogeneity into the storage medium. The throughput is one of the essential resources in heterogeneous storage systems, and how to allocate the throughput plays a crucial role in user performance. There are many types of research on the throughput allocation of heterogeneous storage

  • Fast Encoding Algorithms for Reed–Solomon Codes With Between Four and Seven Parity Symbols
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-03
    Leilei Yu; Zhichang Lin; Sian-Jheng Lin; Yunghsiang S. Han; Nenghai Yu

    This article describes a fast Reed–Solomon encoding algorithm with four and seven parity symbols in between. First, we show that the syndrome of Reed–Solomon codes can be computed via the Reed–Muller transform. Based on this result, the fast encoding algorithm is then derived. Analysis shows that the proposed approach asymptotically requires 3 XORs per data bit, representing an improvement over previous

  • All-Digital Control-Theoretic Scheme to Optimize Energy Budget and Allocation in Multi-Cores
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-03
    Davide Zoni; Luca Cremona; William Fornaciari

    The Internet-of-Things (IoT) revolution fueled new challenges and opportunities to achieve computational efficiency goals. Embedded devices are required to execute multiple applications for which a suitable distribution of the computing power must be adapted at run-time. Such complex hardware platforms have to sustain the continuous acquisition and processing of data under severe energy budget constraints

  • Joint Management of CPU and NVDIMM for Breaking Down the Great Memory Wall
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-06
    Chun-Feng Wu; Yuan-Hao Chang; Ming-Chang Yang; Tei-Wei Kuo

    To provide larger memory space with lower costs, NVDIMM is a production-ready device. However, directly placing NVDIMM as the main memory would seriously degrade the system performance because of the “great memory wall” caused by the fact that in NVDIMM, the slow memory (e.g., flash memory) is several orders of magnitude slower than the fast memory (e.g., DRAM). In this article, we present a joint

  • Crossbar-Constrained Technology Mapping for ReRAM Based In-Memory Computing
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-07
    Debjyoti Bhattacharjee; Yaswanth Tavva; Arvind Easwaran; Anupam Chattopadhyay

    In-memory computing has gained significant attention due to the potential for dramatic improvement in speed and energy. Redox-based resistive RAMs (ReRAMs), capable of non-volatile storage and logic operations simultaneously have been used for logic-in-memory computing approaches. To this effect, we propose Re RAM based V LIW A rchitecture for in- M emory com P uting (ReVAMP), supported by a detailed

  • Automated Performance Modeling of HPC Applications Using Machine Learning
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-10
    Jingwei Sun; Guangzhong Sun; Shiyan Zhan; Jiepeng Zhang; Yong Chen

    Automated performance modeling and performance prediction of parallel programs are highly valuable in many use cases, such as in guiding task management and job scheduling, offering insights of application behaviors, and assisting resource requirement estimation. The performance of parallel programs is affected by numerous factors, including but not limited to hardware, applications, algorithms, and

  • A Neural Network Based Fault Management Scheme for Reliable Image Processing
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-10
    Matteo Biasielli; Cristiana Bolchini; Luca Cassano; Erdem Koyuncu; Antonio Miele

    Traditional reliability approaches introduce relevant costs to achieve unconditional correctness during data processing. However, many application environments are inherently tolerant to a certain degree of inexactness or inaccuracy. In this article, we focus on the practical scenario of image processing in space, a domain where faults are a threat, while the applications are inherently tolerant to

  • WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20
    Chao Wang; Lei Gong; Xiang Ma; Xi Li; Xuehai Zhou

    Recommendation algorithms, such as Neighborhood-based Collaborative- Filtering (CF), have been widely applied in various emerging machine learning applications. However, under the circumstance of the explosive big data, it poses significant challenges to CF recommendation algorithms as it is becoming quite time and energy-consuming. It has to be optimized and accelerated by powerful engines to process

  • MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-02
    Byeongho Kim; Jongwook Chung; Eojin Lee; Wonkyung Jung; Sunjung Lee; Jaewan Choi; Jaehyun Park; Minbok Wi; Sukhan Lee; Jung Ho Ahn

    Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance and energy efficiency of MV-mul is determined by those of main-memory DRAM. Therefore, computing MV-mul

  • $\pi$π-BA: Bundle Adjustment Hardware Accelerator Based on Distribution of 3D-Point Observations
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-02
    Qiang Liu; Shuzhen Qin; Bo Yu; Jie Tang; Shaoshan Liu

    Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, street view map generation, and even space exploration etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large

  • Machine Learning Computers With Fractal von Neumann Architecture
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-20
    Yongwei Zhao; Zhe Fan; Zidong Du; Tian Zhi; Ling Li; Qi Guo; Shaoli Liu; Zhiwei Xu; Tianshi Chen; Yunji Chen

    Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of programming productivity. However, with the fast development

  • Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-18
    Yijin Guan; Guangyu Sun; Zhihang Yuan; Xingchen Li; Ningyi Xu; Shu Chen; Jason Cong; Yuan Xie

    Convolutional neural networks (CNNs) have achieved great success in numerous AI applications. To improve inference efficiency of CNNs, researchers have proposed various pruning techniques to reduce both computation intensity and storage overhead. These pruning techniques result in multi-level sparsity irregularities in CNNs. Together with that in activation matrices, which is induced by employment

  • State of the Journal
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-11
    Ahmed Louri

    Presents the introductory editorial for this issue of the publication.

  • Approximate Restoring Dividers Using Inexact Cells and Estimation From Partial Remainders
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-15
    Elizabeth Adams; Suganthi Venkatachalam; Seok-Bum Ko

    Approximate computing can be used in error-resilient applications to reduce power consumption and increase overall circuit performance. This article introduces two approximate dividers with restoring array-based architecture that achieve substantial hardware savings while maintaining high accuracy when compared to existing approximate designs. The first design replaces exact restoring divider cells

  • Exploiting Asymmetric Errors for LDPC Decoding Optimization on 3D NAND Flash Memory
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-18
    Qiao Li; Liang Shi; Yufei Cui; Chun Jason Xue

    By stacking layers vertically, the adoption of 3D NAND has significantly increased the capacity for storage systems. The complex structure of 3D NAND introduces more errors than planer flash. To address the reliability issue, low-density parity-check (LDPC) code with a strong error correction capability is now widely applied on 3D NAND flash memory. However, LDPC has long decoding latency when the

  • Arithmetic Approaches for Rigorous Design of Reliable Fixed-Point LTI Filters
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-31
    Anastasia Volkova; Thibault Hilaire; Christoph Lauter

    In this paper we target the Fixed-Point (FxP) implementation of Linear Time-Invariant (LTI) filters evaluated with state-space equations. We assume that wordlengths are fixed and that our goal is to determine binary point positions that guarantee the absence of overflows while maximizing accuracy. We provide a model for the worst-case error analysis of FxP filters that gives tight bounds on the output

  • Graph Similarity and its Applications to Hardware Security
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-26
    Marc Fyrbiak; Sebastian Wallat; Sascha Reinhard; Nicolai Bissantz; Christof Paar

    Hardware reverse engineering is a powerful and universal tool for both security engineers and adversaries. From a defensive perspective, it allows for detection of intellectual property infringements and hardware Trojans, while it simultaneously can be used for product piracy and malicious circuit manipulations. From a designer's perspective, it is crucial to have an estimate of the costs associated

  • NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-09
    Neng Zhang; Qiao Qin; Hang Yuan; Chenggao Zhou; Shouyi Yin; ShaoJun Wei; Leibo Liu

    Large integer multiplication, or large degree polynomial multiplication, is the most time-consuming operation in fully homomorphic encryption (FHE). Low area and power consumption are difficult to maintain while achieving high performance for a large size multiplier. To address this issue, an area-efficient low-power architecture for multiplication, named NTTU, is proposed in this article. First, a

  • High Throughput/Gate AES Hardware Architectures Based on Datapath Compression
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-04
    Rei Ueno; Sumio Morioka; Noriyuki Miura; Kohei Matsuda; Makoto Nagata; Shivam Bhasin; Yves Mathieu; Tarik Graba; Jean-Luc Danger; Naofumi Homma

    This article proposes highly efficient Advanced Encryption Standard (AES) hardware architectures that support encryption and both encryption and decryption. New operation-reordering and register-retiming techniques presented in this article allow us to unify the inversion circuits in SubBytes and InvSubBytes without any delay overhead. In addition, a new optimization technique for minimizing linear

  • A Management Scheme of Multi-Level Retention-Time Queues for Improving the Endurance of Flash-Memory Storage Devices
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-20
    David Kuang-Hui Yu; Jen-Wei Hsieh

    As flash memory technology has been scaled down to 1x nm and more bits can be stored in a cell, the storage density of flash memory has been significantly improved. However, these technical trends also severely hurt the programming speed and endurance of flash memory. The internal data retention time is the duration for which a flash cell can correctly hold data. By relaxing internal data retention

  • Performance Analysis for Heterogeneous Cloud Servers Using Queueing Theory
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-28
    Shuang Wang; Xiaoping Li; Rubén Ruiz

    In this article, we consider the problem of selecting appropriate heterogeneous servers in cloud centers for stochastically arriving requests in order to obtain an optimal tradeoff between the expected response time and power consumption. Heterogeneous servers with uncertain setup times are far more common than homogenous ones. The heterogeneity of servers and stochastic requests pose great challenges

  • Bufferless Network-on-Chips With Bridged Multiple Subnetworks for Deflection Reduction and Energy Savings
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-18
    Xiyue Xiang; Purushottam Sigdel; Nian-Feng Tzeng

    A bufferless network-on-chip (NoC) can deliver high energy efficiency, but such a NoC is subject to growing deflection when its traffic load rises. This article proposes Deflection Containment (DeC) for the bufferless NoC to address its notorious shortcomings of excessive deflection for performance improvement and energy savings. With multiple subnetworks bridged by an added link between two corresponding

  • PRS: A Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-19
    Jiang Zhou; Yong Chen; Wei Xie; Dong Dai; Shuibing He; Weiping Wang

    Data replication is a key technique to achieve high data availability, reliability, and optimized performance in distributed storage systems. In recent years, with emerged new storage devices, heterogeneous object-based storage systems, such as a storage system with a mix of hard disk drives, solid state drives, and other non-volatile memory devices have become increasingly attractive since they combine

  • Mangrove: An Inference-Based Dynamic Invariant Mining for GPU Architectures
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-18
    Nicola Bombieri; Federico Busato; Alessandro Danese; Luca Piccolboni; Graziano Pravadelli

    Likely invariants model properties that hold in operating conditions of a computing system. Dynamic mining of invariants aims at extracting logic formulas representing such properties from the system execution traces, and it is widely used for verification of intellectual property (IP) blocks. Although the extracted formulas represent likely invariants that hold in the considered traces, there is no

  • CIMAT: A Compute-In-Memory Architecture for On-chip Training Based on Transpose SRAM Arrays
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-13
    Hongwu Jiang; Xiaochen Peng; Shanshi Huang; Shimeng Yu

    Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix

  • Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-05
    Xi Zeng; Tian Zhi; Xuda Zhou; Zidong Du; Qi Guo; Shaoli Liu; Bingrui Wang; Yuanbo Wen; Chao Wang; Xuehai Zhou; Ling Li; Tianshi Chen; Ninghui Sun; Yunji Chen

    Neural networks have become the dominant algorithms rapidly as they achieve state-of-the-art performance in a broad range of applications such as image recognition, speech recognition, and natural language processing. However, neural networks keep moving toward deeper and larger architectures, posing a great challenge to hardware systems due to the huge amount of data and computations. Although sparsity

  • Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-02-21
    Liancheng Jia; Yun Liang; Xiuhong Li; Liqiang Lu; Shengen Yan

    Modern Convolutional Neural Networks (CNNs) require a massive amount of convolution operations. To address the overwhelming computation problem, Winograd and FFT fast algorithms have been used as effective approaches to reduce the number of multiplications. Inputs and filters are transformed into special domains then perform element-wise multiplication, which can be transformed into batched GEMM operation

  • A Neural Network-Based On-Device Learning Anomaly Detector for Edge Devices
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-02-17
    Mineto Tsukada; Masaaki Kondo; Hiroki Matsutani

    Semi-supervised anomaly detection is an approach to identify anomalies by learning the distribution of normal data. Backpropagation neural networks (i.e., BP-NNs) based approaches have recently drawn attention because of their good generalization capability. In a typical situation, BP-NN-based models are iteratively optimized in server machines with input data gathered from the edge devices. However

  • REMOTE: Robust External Malware Detection Framework by Using Electromagnetic Signals
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-07
    Nader Sehatbakhsh; Alireza Nazari; Monjur Alam; Frank Werner; Yuanda Zhu; Alenka Zajic; Milos Prvulovic

    Cyber-physical systems (CPS) are controlling many critical and sensitive aspects of our physical world while being continuously exposed to potential cyber-attacks. These systems typically have limited performance, memory, and energy reserves, which limits their ability to run existing advanced malware protection, and that, in turn, makes securing them very challenging. To tackle these problems, this

  • Lightweight Key Encapsulation Using LDPC Codes on FPGAs
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-21
    Jingwei Hu; Marco Baldi; Paolo Santini; Neng Zeng; San Ling; Huaxiong Wang

    In this paper, we present a lightweight hardware design for a recently proposed quantum-safe key encapsulation mechanism based on QC-LDPC codes called LEDAkem, which has been admitted as a round-2 candidate to the NIST post-quantum standardization project. Existing implementations focus on high speed while few of them take into account area or power efficiency, which are particularly decisive for low-cost

  • Towards the Integration of Reverse Converters into the RNS Channels
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-21
    Leonel Sousa; Rogério Paludo; Paulo Martins; Hector Pettenghi

    The conversion from a Residue Number System (RNS) to a weighted representation is a costly inter-modulo operation that introduces delay and area overhead to RNS processors, while also increasing power consumption. This paper proposes a new approach to decompose the reverse conversion into operations that can be processed by the arithmetic units already present in the RNS independent channels. This

  • ApGAN: Approximate GAN for Robust Low Energy Learning From Imprecise Components
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-23
    Arman Roohi; Shadi Sheikhfaal; Shaahin Angizi; Deliang Fan; Ronald F DeMara

    A Generative Adversarial Network (GAN) is an adversarial learning approach which empowers conventional deep learning methods by alleviating the demands of massive labeled datasets. However, GAN training can be computationally-intensive limiting its feasibility in resource-limited edge devices. In this paper, we propose an approximate GAN (ApGAN) for accelerating GANs from both algorithm and hardware

  • Impeccable Circuits
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-23
    Anita Aghaie; Amir Moradi; Shahram Rasoolzadeh; Aein Rezaei Shahmirzadi; Falk Schellenberg; Tobias Schneider

    By injecting faults, active physical attacks pose serious threats to cryptographic hardware where Concurrent Error Detection (CED) schemes are promising countermeasures. They are usually based on an Error-Detecting Code (EDC) which enables detecting certain injected faults depending on the specification of the underlying code. Here, we propose a methodology to enable correct, practical, and robust

  • Hotness- and Lifetime-Aware Data Placement and Migration for High-Performance Deep Learning on Heterogeneous Memory Systems
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-25
    Myeonggyun Han; Jihoon Hyun; Seongbeom Park; Woongki Baek

    Heterogeneous memory systems that comprise memory nodes with disparate architectural characteristics (e.g., DRAM and high-bandwidth memory (HBM)) have surfaced as a promising solution in a variety of computing domains ranging from embedded to high-performance computing. Since deep learning (DL) is one of the most widely-used workloads in various computing domains, it is crucial to explore efficient

  • Energy-Efficient Pattern Recognition Hardware With Elementary Cellular Automata
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-25
    Alejandro Morán; Christiam F. Frasser; Miquel Roca; Josep L. Rosselló

    The development of power-efficient Machine Learning Hardware is of high importance to provide Artificial Intelligence (AI) characteristics to those devices operating at the Edge. Unfortunately, state-of-the-art data-driven AI techniques such as deep learning are too costly in terms of hardware and energy requirements for Edge Computing (EC) devices. Recently, Cellular Automata (CA) have been proposed

  • Design and Analysis of Efficient Maximum/Minimum Circuits for Stochastic Computing
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-28
    Michael Lunglmayr; Daniel Wiesinger; Werner Haselmayr

    In stochastic computing (SC), a real-valued number is represented by a stochastic bit stream, encoding its value in the probability of obtaining a one. This leads to a significantly lower hardware effort for various functions and provides a higher tolerance to errors (e.g., bit flips) compared to binary radix representation. The implementation of a stochastic max/min function is important for many

  • Pursuing Extreme Power Efficiency With PPCC Guided NoC DVFS
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-28
    Yuan Yao; Zhonghai Lu

    In sharp contrast to conventional performance indicative based Network-on-Chip (NoC) DVFS, where the direct relation between application performance and NoC power consumption is missing, we exploit the concept of Performance-Power Characteristic Curve (PPCC) newly proposed in the literature to approach maximum NoC power efficiency. PPCC, which defines the direct relation between application performance

  • Novel Methods for Efficient Realization of Logic Functions Using Switching Lattices
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-31
    Levent Aksoy; Mustafa Altun

    Two-dimensional switching lattices including four-terminal switches are introduced as alternative structures to realize logic functions, aiming to outperform the designs consisting of one-dimensional two-terminal switches. Exact and approximate algorithms have been proposed for the problem of finding a switching lattice which implements a given logic function and has the minimum size, i.e., a minimum

  • Grow and Prune Compact, Fast, and Accurate LSTMs
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-20
    Xiaoliang Dai; Hongxu Yin; Niraj K. Jha

    Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control

  • Energy Efficient On-Demand Dynamic Branch Prediction Models
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-29
    Milad Mohammadi; Song Han; Ehsan Atoofian; Amirali Baniasadi; Tor M. Aamodt; William J. Dally

    The branch predictor unit (BPU) is among the main energy consuming components in out-of-order (OoO) processors. For integer applications, we find 16 percent of the processor energy is consumed by the BPU. BPU is accessed in parallel with the instruction cache before it is known if a fetch group contains control instructions. We find 85 percent of BPU lookups are done for non-branch operations, and

  • Pre-Defined Sparsity for Low-Complexity Convolutional Neural Networks
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-02-10
    Souvik Kundu; Mahdi Nazemi; Massoud Pedram; Keith M. Chugg; Peter A. Beerel

    The high energy cost of processing deep convolutional neural networks impedes their ubiquitous deployment in energy-constrained platforms such as embedded systems and IoT devices. This article introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters. Due to the efficient storage of our periodic sparse kernels, the parameter

  • Accelerating Deep Learning Systems via Critical Set Identification and Model Compression
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-31
    Rui Han; Chi Harold Liu; Shilin Li; Shilin Wen; Xue Liu

    Modern distributed engines are increasingly deployed to accelerate large-scaled deep learning (DL) training jobs. While the parallelism of distributed workers/nodes promises the scalability, the computation and communication overheads of the underlying iterative solving algorithms, e.g., stochastic gradient decent, unfortunately become the bottleneck for distributed DL training jobs. Existing approaches

  • Algorithms for Inversion Mod $p^k$pk
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-30
    Çetin Kaya Koç

    This article describes and analyzes all existing algorithms for computing $x=a^{-1}\pmod {p^k}$ for a prime $p$ , and also introduces a new algorithm based on the exact solution of linear equations using $p$ -adic expansions. The algorithm starts with the initial value $c=a^{-1}\pmod {p}$ and iteratively computes the digits of the inverse $x=a^{-1}\pmod {p^k}$ in base $p$ . The mod 2 version of the

  • A Fast Filtering Mechanism to Improve Efficiency of Large-Scale Video Analytics
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-30
    Chen Zhang; Qiang Cao; Hong Jiang; Wenhui Zhang; Jingjun Li; Jie Yao

    Surveillance cameras are ubiquitous around us. Emerging full-feature object-detection models can analyze surveillance videos with high accuracy but consume much computation. Directly applying these models for practical scenarios with large-scale cameras is prohibitively expensive. This, however, is wasteful and unnecessary considering that user-defined anomalies occur rarely among these videos. Therefore

  • An Adaptive Thermal Management Framework for Heterogeneous Multi-Core Processors
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-28
    Young Geun Kim; Minyong Kim; Joonho Kong; Sung Woo Chung

    Off-the-shelf embedded systems have adopted heterogeneous multi-core processors which have high-performance big cores and low-power small cores. Though there are two different types of cores in heterogeneous multi-core processors, conventional DVFS (Dynamic Voltage and Frequency Scaling)-based DTM (Dynamic Thermal Management) techniques do not utilize the different types of cores to cool down hot cores

  • Accurate Cost Estimation of Memory Systems Utilizing Machine Learning and Solutions from Computer Vision for Design Automation
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-23
    Lorenzo Servadei; Edoardo Mosca; Elena Zennaro; Keerthikumara Devarajegowda; Michael Werner; Wolfgang Ecker; Robert Wille

    Hardware/software co-designs are usually defined at high levels of abstractions at the beginning of the design process in order to provide a variety of options on how to realize a system. This allows for design exploration which relies on knowing the costs of different design configurations (with respect to hardware usage and firmware metrics). To this end, methods for cost estimation are frequently

  • AxMAP: Making Approximate Adders Aware of Input Patterns
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-23
    Morteza Rezaalipour; Mohammad Rezaalipour; Masoud Dehyadegari; Mahdi Nazm Bojnordi

    Making approximate computing specific to user requirements is crucial to system performance, energy-efficiency, and reliability. However, developing hardware for such optimization becomes a significant challenge due to the high cost of examining all potential choices while exploring a large design space. One determinant aspect of exploring a design space is the efficiency of evaluating error metrics

  • A Deep Reinforcement Learning Based Offloading Game in Edge Computing
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-23
    Yufeng Zhan; Song Guo; Peng Li; Jiang Zhang

    Edge computing is a new paradigm to provide strong computing capability at the edge of pervasive radio access networks close to users. A critical research challenge of edge computing is to design an efficient offloading strategy to decide which tasks can be offloaded to edge servers with limited resources. Although many research efforts attempt to address this challenge, they need centralized control

  • Page Reusability-Based Cache Partitioning for Multi-Core Systems
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
    Jiwoong Park; Heonyoung Yeom; Yongseok Son

    Most modern multi-core processors provide a shared last level cache (LLC) where data from all cores are placed to improve performance. However, this opens a new challenge for cache management, owing to cache pollution. With cache pollution, data with weak temporal locality can evict other data with strong temporal locality when both are mapped into the same cache set. In this article, we propose page

  • XeFlow: Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU Platform
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
    Zhifang Li; Beicheng Peng; Chuliang Weng

    Nowadays, GPUs have achieved high throughput computing by running plenty of threads. However, owing to disjoint memory spaces of discrete CPU-GPU systems, exploiting CPU and GPU within a data processing pipeline is a non-trivial issue, which can only be resolved by the coarse-grained workflow of “copy-kernel-copy” or its variants in essence. There is an underlying bottleneck caused by frequent inter-processor

  • Request Flow Coordination for Growing-Scale Solid-State Drives
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
    Ming-Chang Yang; Yuan-Hao Chang; Tei-Wei Kuo; Chun-Feng Wu

    Performance-intensive applications have led both interface and architecture changes of high-end, growing-scale solid-state drives (SSDs). However, we observe that most of the time, the actual drive performance could not be easily scaled or boosted up with the increasing of internal resources of growing-scale SSDs due to the potential congestion of I/O requests. Such observation inspires this article

  • Hierarchical Orchestration of Disaggregated Memory
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
    Wenqi Cao; Ling Liu

    This article presents XMemPod , a hierarchical disaggregated memory orchestration system. XMemPod virtualizes cluster wide memory to scale large memory workloads in virtualized clouds. It makes three novel contributions: (1) XMemPod offers efficient, transparent, and dynamic sharing of available memory that is disaggregated across VMs on the same host or in the cluster. (2) XMemPod provides a hierarchical

  • Per-Operation Reusability Based Allocation and Migration Policy for Hybrid Cache
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-09-27
    Minsik Oh; Kwangsu Kim; Duheon Choi; Hyuk-Jun Lee; Eui-Young Chung

    Recently, a hybrid cache consisting of SRAM and STT-RAM has attracted much attention as a future memory by complementing each other with different memory characteristics. Prior works focused on developing data allocation and migration techniques considering write-intensity to reduce write energy at STT-RAM. However, these works often neglect the impact of operation-specific reusability of a cache line

  • Footprint-Based DIMM Hotplug
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-04
    Shinobu Miwa; Masaya Ishihara; Hayato Yamaki; Hiroki Honda; Martin Schulz

    Power-efficiency has become one of the most critical concerns for HPC as we continue to scale computational capabilities. A significant fraction of system power is spent on large main memories, mainly caused by the substantial amount of DIMM standby power needed. However, while necessary for some workloads, for many workloads large memory configurations are too rich, i.e., these workloads only make

  • Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCs
    IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-04
    Amit Kumar Singh; Karunakar Reddy Basireddy; Alok Prakash; Geoff V. Merrett; Bashir M. Al-Hashimi

    Heterogeneous Mobile System-on-Chips (SoCs) containing CPU and GPU cores are becoming prevalent in embedded computing, and they need to execute applications concurrently. However, existing run-time management approaches do not perform adaptive mapping and thread-partitioning of applications while exploiting both CPU and GPU cores at the same time. In this paper, we propose an adaptive mapping and thread-partitioning

Contents have been reproduced by permission of the publishers.
Springer Nature Live 产业与创新线上学术论坛
ACS ES&T Engineering
ACS ES&T Water