当前期刊: IEEE Transactions on Computers Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • State of the Journal
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2020-03-11
    Ahmed Louri

    Presents the introductory editorial for this issue of the publication.

  • Approximate Restoring Dividers Using Inexact Cells and Estimation From Partial Remainders
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-15
    Elizabeth Adams; Suganthi Venkatachalam; Seok-Bum Ko

    Approximate computing can be used in error-resilient applications to reduce power consumption and increase overall circuit performance. This article introduces two approximate dividers with restoring array-based architecture that achieve substantial hardware savings while maintaining high accuracy when compared to existing approximate designs. The first design replaces exact restoring divider cells

  • Exploiting Asymmetric Errors for LDPC Decoding Optimization on 3D NAND Flash Memory
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-12-18
    Qiao Li; Liang Shi; Yufei Cui; Chun Jason Xue

    By stacking layers vertically, the adoption of 3D NAND has significantly increased the capacity for storage systems. The complex structure of 3D NAND introduces more errors than planer flash. To address the reliability issue, low-density parity-check (LDPC) code with a strong error correction capability is now widely applied on 3D NAND flash memory. However, LDPC has long decoding latency when the

  • Arithmetic Approaches for Rigorous Design of Reliable Fixed-Point LTI Filters
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-31
    Anastasia Volkova; Thibault Hilaire; Christoph Lauter

    In this paper we target the Fixed-Point (FxP) implementation of Linear Time-Invariant (LTI) filters evaluated with state-space equations. We assume that wordlengths are fixed and that our goal is to determine binary point positions that guarantee the absence of overflows while maximizing accuracy. We provide a model for the worst-case error analysis of FxP filters that gives tight bounds on the output

  • Graph Similarity and its Applications to Hardware Security
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-26
    Marc Fyrbiak; Sebastian Wallat; Sascha Reinhard; Nicolai Bissantz; Christof Paar

    Hardware reverse engineering is a powerful and universal tool for both security engineers and adversaries. From a defensive perspective, it allows for detection of intellectual property infringements and hardware Trojans, while it simultaneously can be used for product piracy and malicious circuit manipulations. From a designer's perspective, it is crucial to have an estimate of the costs associated

  • NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-12-09
    Neng Zhang; Qiao Qin; Hang Yuan; Chenggao Zhou; Shouyi Yin; ShaoJun Wei; Leibo Liu

    Large integer multiplication, or large degree polynomial multiplication, is the most time-consuming operation in fully homomorphic encryption (FHE). Low area and power consumption are difficult to maintain while achieving high performance for a large size multiplier. To address this issue, an area-efficient low-power architecture for multiplication, named NTTU, is proposed in this article. First, a

  • High Throughput/Gate AES Hardware Architectures Based on Datapath Compression
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-12-04
    Rei Ueno; Sumio Morioka; Noriyuki Miura; Kohei Matsuda; Makoto Nagata; Shivam Bhasin; Yves Mathieu; Tarik Graba; Jean-Luc Danger; Naofumi Homma

    This article proposes highly efficient Advanced Encryption Standard (AES) hardware architectures that support encryption and both encryption and decryption. New operation-reordering and register-retiming techniques presented in this article allow us to unify the inversion circuits in SubBytes and InvSubBytes without any delay overhead. In addition, a new optimization technique for minimizing linear

  • A Management Scheme of Multi-Level Retention-Time Queues for Improving the Endurance of Flash-Memory Storage Devices
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-20
    David Kuang-Hui Yu; Jen-Wei Hsieh

    As flash memory technology has been scaled down to 1x nm and more bits can be stored in a cell, the storage density of flash memory has been significantly improved. However, these technical trends also severely hurt the programming speed and endurance of flash memory. The internal data retention time is the duration for which a flash cell can correctly hold data. By relaxing internal data retention

  • Performance Analysis for Heterogeneous Cloud Servers Using Queueing Theory
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-28
    Shuang Wang; Xiaoping Li; Rubén Ruiz

    In this article, we consider the problem of selecting appropriate heterogeneous servers in cloud centers for stochastically arriving requests in order to obtain an optimal tradeoff between the expected response time and power consumption. Heterogeneous servers with uncertain setup times are far more common than homogenous ones. The heterogeneity of servers and stochastic requests pose great challenges

  • Bufferless Network-on-Chips With Bridged Multiple Subnetworks for Deflection Reduction and Energy Savings
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-12-18
    Xiyue Xiang; Purushottam Sigdel; Nian-Feng Tzeng

    A bufferless network-on-chip (NoC) can deliver high energy efficiency, but such a NoC is subject to growing deflection when its traffic load rises. This article proposes Deflection Containment (DeC) for the bufferless NoC to address its notorious shortcomings of excessive deflection for performance improvement and energy savings. With multiple subnetworks bridged by an added link between two corresponding

  • PRS: A Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-19
    Jiang Zhou; Yong Chen; Wei Xie; Dong Dai; Shuibing He; Weiping Wang

    Data replication is a key technique to achieve high data availability, reliability, and optimized performance in distributed storage systems. In recent years, with emerged new storage devices, heterogeneous object-based storage systems, such as a storage system with a mix of hard disk drives, solid state drives, and other non-volatile memory devices have become increasingly attractive since they combine

  • Mangrove: An Inference-Based Dynamic Invariant Mining for GPU Architectures
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-18
    Nicola Bombieri; Federico Busato; Alessandro Danese; Luca Piccolboni; Graziano Pravadelli

    Likely invariants model properties that hold in operating conditions of a computing system. Dynamic mining of invariants aims at extracting logic formulas representing such properties from the system execution traces, and it is widely used for verification of intellectual property (IP) blocks. Although the extracted formulas represent likely invariants that hold in the considered traces, there is no

  • REMOTE: Robust External Malware Detection Framework by Using Electromagnetic Signals
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-07
    Nader Sehatbakhsh; Alireza Nazari; Monjur Alam; Frank Werner; Yuanda Zhu; Alenka Zajic; Milos Prvulovic

    Cyber-physical systems (CPS) are controlling many critical and sensitive aspects of our physical world while being continuously exposed to potential cyber-attacks. These systems typically have limited performance, memory, and energy reserves, which limits their ability to run existing advanced malware protection, and that, in turn, makes securing them very challenging. To tackle these problems, this

  • Lightweight Key Encapsulation Using LDPC Codes on FPGAs
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-21
    Jingwei Hu; Marco Baldi; Paolo Santini; Neng Zeng; San Ling; Huaxiong Wang

    In this paper, we present a lightweight hardware design for a recently proposed quantum-safe key encapsulation mechanism based on QC-LDPC codes called LEDAkem, which has been admitted as a round-2 candidate to the NIST post-quantum standardization project. Existing implementations focus on high speed while few of them take into account area or power efficiency, which are particularly decisive for low-cost

  • Towards the Integration of Reverse Converters into the RNS Channels
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-21
    Leonel Sousa; Rogério Paludo; Paulo Martins; Hector Pettenghi

    The conversion from a Residue Number System (RNS) to a weighted representation is a costly inter-modulo operation that introduces delay and area overhead to RNS processors, while also increasing power consumption. This paper proposes a new approach to decompose the reverse conversion into operations that can be processed by the arithmetic units already present in the RNS independent channels. This

  • ApGAN: Approximate GAN for Robust Low Energy Learning From Imprecise Components
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-23
    Arman Roohi; Shadi Sheikhfaal; Shaahin Angizi; Deliang Fan; Ronald F DeMara

    A Generative Adversarial Network (GAN) is an adversarial learning approach which empowers conventional deep learning methods by alleviating the demands of massive labeled datasets. However, GAN training can be computationally-intensive limiting its feasibility in resource-limited edge devices. In this paper, we propose an approximate GAN (ApGAN) for accelerating GANs from both algorithm and hardware

  • Impeccable Circuits
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-23
    Anita Aghaie; Amir Moradi; Shahram Rasoolzadeh; Aein Rezaei Shahmirzadi; Falk Schellenberg; Tobias Schneider

    By injecting faults, active physical attacks pose serious threats to cryptographic hardware where Concurrent Error Detection (CED) schemes are promising countermeasures. They are usually based on an Error-Detecting Code (EDC) which enables detecting certain injected faults depending on the specification of the underlying code. Here, we propose a methodology to enable correct, practical, and robust

  • Hotness- and Lifetime-Aware Data Placement and Migration for High-Performance Deep Learning on Heterogeneous Memory Systems
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-25
    Myeonggyun Han; Jihoon Hyun; Seongbeom Park; Woongki Baek

    Heterogeneous memory systems that comprise memory nodes with disparate architectural characteristics (e.g., DRAM and high-bandwidth memory (HBM)) have surfaced as a promising solution in a variety of computing domains ranging from embedded to high-performance computing. Since deep learning (DL) is one of the most widely-used workloads in various computing domains, it is crucial to explore efficient

  • Energy-Efficient Pattern Recognition Hardware With Elementary Cellular Automata
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-25
    Alejandro Morán; Christiam F. Frasser; Miquel Roca; Josep L. Rosselló

    The development of power-efficient Machine Learning Hardware is of high importance to provide Artificial Intelligence (AI) characteristics to those devices operating at the Edge. Unfortunately, state-of-the-art data-driven AI techniques such as deep learning are too costly in terms of hardware and energy requirements for Edge Computing (EC) devices. Recently, Cellular Automata (CA) have been proposed

  • Design and Analysis of Efficient Maximum/Minimum Circuits for Stochastic Computing
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-28
    Michael Lunglmayr; Daniel Wiesinger; Werner Haselmayr

    In stochastic computing (SC), a real-valued number is represented by a stochastic bit stream, encoding its value in the probability of obtaining a one. This leads to a significantly lower hardware effort for various functions and provides a higher tolerance to errors (e.g., bit flips) compared to binary radix representation. The implementation of a stochastic max/min function is important for many

  • Pursuing Extreme Power Efficiency With PPCC Guided NoC DVFS
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-28
    Yuan Yao; Zhonghai Lu

    In sharp contrast to conventional performance indicative based Network-on-Chip (NoC) DVFS, where the direct relation between application performance and NoC power consumption is missing, we exploit the concept of Performance-Power Characteristic Curve (PPCC) newly proposed in the literature to approach maximum NoC power efficiency. PPCC, which defines the direct relation between application performance

  • Novel Methods for Efficient Realization of Logic Functions Using Switching Lattices
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-31
    Levent Aksoy; Mustafa Altun

    Two-dimensional switching lattices including four-terminal switches are introduced as alternative structures to realize logic functions, aiming to outperform the designs consisting of one-dimensional two-terminal switches. Exact and approximate algorithms have been proposed for the problem of finding a switching lattice which implements a given logic function and has the minimum size, i.e., a minimum

  • Grow and Prune Compact, Fast, and Accurate LSTMs
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-20
    Xiaoliang Dai; Hongxu Yin; Niraj K. Jha

    Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control

  • Energy Efficient On-Demand Dynamic Branch Prediction Models
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-11-29
    Milad Mohammadi; Song Han; Ehsan Atoofian; Amirali Baniasadi; Tor M. Aamodt; William J. Dally

    The branch predictor unit (BPU) is among the main energy consuming components in out-of-order (OoO) processors. For integer applications, we find 16 percent of the processor energy is consumed by the BPU. BPU is accessed in parallel with the instruction cache before it is known if a fetch group contains control instructions. We find 85 percent of BPU lookups are done for non-branch operations, and

  • Per-Operation Reusability Based Allocation and Migration Policy for Hybrid Cache
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-27
    Minsik Oh; Kwangsu Kim; Duheon Choi; Hyuk-Jun Lee; Eui-Young Chung

    Recently, a hybrid cache consisting of SRAM and STT-RAM has attracted much attention as a future memory by complementing each other with different memory characteristics. Prior works focused on developing data allocation and migration techniques considering write-intensity to reduce write energy at STT-RAM. However, these works often neglect the impact of operation-specific reusability of a cache line

  • Footprint-Based DIMM Hotplug
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-04
    Shinobu Miwa; Masaya Ishihara; Hayato Yamaki; Hiroki Honda; Martin Schulz

    Power-efficiency has become one of the most critical concerns for HPC as we continue to scale computational capabilities. A significant fraction of system power is spent on large main memories, mainly caused by the substantial amount of DIMM standby power needed. However, while necessary for some workloads, for many workloads large memory configurations are too rich, i.e., these workloads only make

  • Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCs
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-04
    Amit Kumar Singh; Karunakar Reddy Basireddy; Alok Prakash; Geoff V. Merrett; Bashir M. Al-Hashimi

    Heterogeneous Mobile System-on-Chips (SoCs) containing CPU and GPU cores are becoming prevalent in embedded computing, and they need to execute applications concurrently. However, existing run-time management approaches do not perform adaptive mapping and thread-partitioning of applications while exploiting both CPU and GPU cores at the same time. In this paper, we propose an adaptive mapping and thread-partitioning

  • Optimal Metastability-Containing Sorting via Parallel Prefix Computation
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-07
    Johannes Bund; Christoph Lenzen; Moti Medina

    Friedrichs et al. (TC 2018) showed that metastability can be contained when sorting inputs arising from time-to-digital converters, i.e., measurement values can be correctly sorted without resolving metastability using synchronizers first. However, this work left open whether this can be done by small circuits. We show that this is indeed possible, by providing a circuit that sorts Gray code inputs

  • Optimizing Parallel I/O Accesses through Pattern-Directed and Layout-Aware Replication
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-08
    Shuibing He; Yanlong Yin; Xian-He Sun; Xuechen Zhang; Zongpeng Li

    As the performance gap between processors and storage devices keeps increasing, I/O performance becomes a critical bottleneck of modern high-performance computing systems. In this paper, we propose a pattern-directed and layout-aware data replication design, named PDLA, to improve the performance of parallel I/O systems. PDLA includes an HDD-based scheme H-PDLA and an SSD-based scheme S-PDLA . For

  • Secure and Efficient Control Data Isolation with Register-Based Data Cloaking
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-11
    Xiayang Wang; Fuqian Huang; Haibo Chen

    Attackers often exploit memory corruption vulnerabilities to overwrite control data and further gain control over victim applications. Despite progress in advanced defensive techniques, such attacks still remain a major security threat. In this article, we present Niffler, a new technique that provides lightweight and practical defense against such attacks. Niffler eliminates the threat of memory corruption

  • Adaptive-Length Coding of Image Data for Low-Cost Approximate Storage
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-11
    Qianqian Fan; David J. Lilja; Sachin S. Sapatnekar

    In the past few years, ever-increasing amounts of image data have been generated by users globally, and these images are routinely stored in cold storage systems in compressed formats. This article investigates the use of approximate storage that leverages the use of cheaper, lower reliability memories that can have higher error rates. Since traditional JPEG-based schemes based on variable-length coding

  • A New Class of Single Burst Error Correcting Codes with Parallel Decoding
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-15
    Abhishek Das; Nur A. Touba

    With technology scaling, burst errors or clustered errors are becoming increasingly common in different types of memories. Multiple bit upsets due to particle strikes, write disturbance errors, and magnetic field coupling are a few of the mechanisms which cause clustered errors. In this article, a new class of single burst error correcting codes are presented which correct a single burst of any size

  • WAL-SSD: Address Remapping-Based Write-Ahead-Logging Solid-State Disks
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-16
    Kyuhwa Han; Hyukjoong Kim; Dongkun Shin

    Recent advances in flash memory technology have reduced the cost-per-bit of flash storage devices such as solid-state drives (SSDs), thereby enabling the development of large-capacity SSDs for enterprise-scale storage. However, two major concerns arise in designing SSDs. First, the size of the address mapping table is increasing in proportion to the capacity of the SSD. The SSD-internal firmware, called

  • Low Latency Floating-Point Division and Square Root Unit
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-16
    Javier D. Bruguera

    Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. We present a floating-point division and square root unit, which implements a radix-64 floating-point division and a radix-16 floating-point square root. To have an affordable implementation

  • NV-Journaling: Locality-Aware Journaling Using Byte-Addressable Non-Volatile Memory
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-17
    Cheng Chen; Qingsong Wei; Weng-Fai Wong; Chundong Wang

    Modern file systems rely on the journaling mechanism to maintain crash consistency. The use of non-volatile memory (NVM) significantly improves the performance of journaling file systems. However, the superior performance of NVM will increase the likelihood of the journal filling up more often, thereby increasing the frequency of checkpointing. Together with the large amount of random checkpointing

  • Comparing Neural Network Based Decoders for the Surface Code
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-23
    Savvas Varsamopoulos; Koen Bertels; Carmen Garcia Almudever

    Matching algorithms can be used for identifying errors in quantum systems, being the most famous the Blossom algorithm. Recent works have shown that small distance quantum error correction codes can be efficiently decoded by employing machine learning techniques based on neural networks (NN). Various NN-based decoders have been proposed to enhance the decoding performance and the decoding time. Their

  • Power- and Cache-Aware Task Mapping with Dynamic Power Budgeting for Many-Cores
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-20
    Martin Rapp; Mark Sagi; Anuj Pathania; Andreas Herkersdorf; Jörg Henkel

    Two factors primarily affect the performance of multi-threaded tasks on many-core processors with logically-shared and physically-distributed Last-Level Cache (LLC): the LLC latencies of threads running on different cores and the per-core power budgets that aim to guarantee thermally safe operation. Two knobs affect these factors: First, the mapping of threads to cores affects both the LLC latencies

  • Lightweight Power Monitoring Framework for Virtualized Computing Environments
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-23
    James Phung; Young Choon Lee; Albert Y. Zomaya

    The pervasive use of virtualization techniques in today's datacenters poses challenges in power monitoring since it is not possible to directly measure the power consumption of a virtual entity such as a virtual machine (VM) and a container. In this paper, we present cWatts++, a lightweight virtual power meter that enables accurate power usage measurement in virtualized computing environments such

  • New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-05
    Hao Zhang; Dongdong Chen; Seok-Bum Ko

    In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more

  • Utilization-Tensity Bound for Real-Time DAG Tasks under Global EDF Scheduling
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-20
    Xu Jiang; Jinghao Sun; Yue Tang; Nan Guan

    Utilization bound is a well-known concept in real-time scheduling theory for sequential periodic tasks, which can be used both for quantifying the performance of scheduling algorithms and as efficient schedulability tests. However, the schedulability of parallel real time task graphs depends on not only utilization, but also another parameter tensity , the ratio between the longest path length and

  • TTADF: Power Efficient Dataflow-Based Multicore Co-Design Flow
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-27
    Ilkka Hautala; Jani Boutellier; Olli Silvén

    The era of mobile communications and the Internet of Things (IoT) has introduced numerous challenges for mobile processing platforms that are responsible for increasingly complex signal processing tasks from different application domains. In recent years, the power efficiency of computing has been improved by adding more parallelism and workload-specific computing resources to such platforms. However

  • KnightSim: A Fast Discrete Event-Driven Simulation Methodology for Computer Architectural Simulation
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-30
    Christopher E. Giles; Christina L. Peterson; Mark A. Heinrich

    In this paper we introduce a fast discrete event-driven simulation methodology, called KnightSim, that is intended for use in the development of future computer architectural simulations. KnightSim extends an older event-driven simulation library by (1) incorporating corrections to functional issues that were introduced by the recent additions of stack protection, pointer mangling, and source fortification

  • Maximizing I/O Throughput and Minimizing Performance Variation via Reinforcement Learning Based I/O Merging for SSDs
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-03
    Chao Wu; Cheng Ji; Qiao Li; Congming Gao; Riwei Pan; Chenchen Fu; Liang Shi; Chun Jason Xue

    Merging technique is widely adopted by I/O schedulers to maximize system I/O throughput. However, I/O merging could increase the latency of individual I/O, thus incurring prolonged I/O latencies and enlarged performance variations. Even with better system throughput, higher worst-case latency experienced by some requests could block the SSD storage system, which violates the QoS (Quality of Service)

  • A Novel Sequence Generation Approach to Diagnose Faults in Reconfigurable Scan Networks
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-03
    Riccardo Cantoro; Aleksa Damljanovic; Matteo Sonza Reorda; Giovanni Squillero

    With the complexity of nanoelectronic devices rapidly increasing, an efficient way to handle large number of embedded instruments became a necessity. The IEEE 1687 standard was introduced to provide flexibility in accessing and controlling such instrumentation through a reconfigurable scan chain. Nowadays, together with testing the system for defects that may affect the scan chains themselves, the

  • Signal Strength-Aware Adaptive Offloading with Local Image Preprocessing for Energy Efficient Mobile Devices
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-03
    Young Geun Kim; Young Seo Lee; Sung Woo Chung

    To prolong battery life of mobile devices, image processing applications often exploit offloading techniques which run some or all of the computations on remote servers. Unfortunately, the existing offloading techniques do not consider the fact that data transmission time and energy consumption of wireless network interfaces exponentially increase when signal strength decreases. In this paper, we propose

  • Scrabble: A Fine-Grained Cache with Adaptive Merged Block
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-06
    Chao Zhang; Yuan Zeng; Xiaochen Guo

    A large fraction of the microprocessor energy is consumed by the data movement in the system. One of the reasons is the inefficiency in the conventional cache design. Cache blocks larger than a word are used in conventional caches to exploit spatial locality. However, many applications only use a small part of a cache block before its eviction. Transferring and storing unused data wastes bandwidth

  • FACCT: FAst, Compact, and Constant-Time Discrete Gaussian Sampler over Integers
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-12
    Raymond K. Zhao; Ron Steinfeld; Amin Sakzad

    The discrete Gaussian sampler is one of the fundamental tools in implementing lattice-based cryptosystems. However, a naive discrete Gaussian sampling implementation suffers from side-channel vulnerabilities, and the existing countermeasures usually introduce significant overhead in either the running speed or the memory consumption. In this paper, we propose a fast, compact, and constant-time implementation

  • Fast and Efficient Convolutional Accelerator for Edge Computing
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-16
    Arash Ardakani; Carlo Condo; Warren J. Gross

    Convolutional neural networks (CNNs) are a vital approach in machine learning. However, their high complexity and energy consumption make them challenging to embed in mobile applications at the edge requiring real-time processes such as smart phones. In order to meet the real-time constraint of edge devices, recently proposed custom hardware CNN accelerators have exploited parallel processing elements

  • 2019 Reviewers List
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2020-01-03

    Presents the reviewers who contributed to this publication in 2019.

  • 2019 Index IEEE Transactions on Computers Vol. 68
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2020-01-03

    Presents the 2019 subject/author index for this publication.

  • TAP: Reducing the Energy of Asymmetric Hybrid Last-Level Cache via Thrashing Aware Placement and Migration
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-05-16
    Jing-Yuan Luo; Hsiang-Yun Cheng; Ing-Chao Lin; Da-Wei Chang

    Emerging non-volatile memories (NVMs) have favorable properties, such as low leakage and high density, and have attracted a lot of attention in recent years. Among them, spin-transfer torque magnetoresistive random access memory (STT-MRAM) with SRAM-comparable read speed is a good candidate to build large last-level caches (LLCs). However, STT-MRAM suffers from long write latency and high write energy

  • Resilience of Randomized RNS Arithmetic with Respect to Side-Channel Leaks of Cryptographic Computation
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-06-25
    Jérôme Courtois; Lokman Abbas-Turki; Jean-Claude Bajard

    In this paper, we want to promote the influence of randomized arithmetic on the leaks during a code execution. When somebody wants to extract some specific information from these leaks, one can observe different emanations of the device like power consumption. These leaks mostly come from the variations of the Hamming distances of the successive states of the system. This phenomenon is particularly

  • A New Cube Attack on MORUS by Using Division Property
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-07-17
    Tao Ye; Yongzhuang Wei; Willi Meier

    MORUS is an authenticated encryption algorithm and one of the candidates in the CAESAR competition. Currently, the security of MORUS received extensive attention. In this paper, a new existence terms detection method in superpoly recovery phase in cube attack is proposed. More precisely, the upper bounding degree of superpoly is first estimated by using the cube attack based on the division property

  • DC-PCM: Mitigating PCM Write Disturbance with Low Performance Overhead by Using Detection Cells
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-07-25
    Jungwhan Choi; Jaemin Jang; Lee-Sup Kim

    As DRAM scaling becomes ever more difficult, Phase Change Memory (PCM) is attracting attention as a new memory or storage class memory. Unfortunately, PCM cell data can be changed by frequently writing `0' to adjacent cells. This phenomenon is called Write Disturbance (WD). To mitigate WD errors with low performance overhead, we propose a Detection Cell PCM (DC-PCM). In the DC-PCM, additional cells

  • Fast Coflow Scheduling via Traffic Compression and Stage Pipelining in Datacenter Networks
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-07-30
    Qihua Zhou; Kun Wang; Peng Li; Deze Zeng; Song Guo; Baoliu Ye; Minyi Guo

    Big data analytics in datacenters often involve scheduling of data-parallel jobs. Traditional scheduling techniques based on improving network resource utilization are subject to limited bandwidth in datacenter networks. To alleviate the shortage of bandwidth, some cluster frameworks employ techniques of traffic compression to reduce transmission consumption. However, they tackle scheduling in a coarse-grained

  • Integration and Boost of a Read-Modify-Write Module in Phase Change Memory System
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-08
    Hyokeun Lee; Moonsoo Kim; Hyunchul Kim; Hyun Kim; Hyuk-Jae Lee

    Phase-change memory (PCM) is a non-volatile memory device with favorable characteristics such as persistence, byte-addressability, and lower latency when compared to flash memory. However, it comprises memory cells that have limited lifetime and higher access latency than DRAM. The row buffer size of a PCM is preferred to be larger than 128B to fill the latency gap between two memories and to reduce

  • Improving Availability of Multicore Real-Time Systems Suffering Both Permanent and Transient Faults
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-14
    Junlong Zhou; Xiaobo Sharon Hu; Yue Ma; Jin Sun; Tongquan Wei; Shiyan Hu

    CMOS scaling has greatly increased concerns for both lifetime reliability due to permanent faults and soft-error reliability due to transient faults. Most existing works only focus on one of the two reliability concerns, but often times techniques used to increase one type of reliability may adversely impact the other type. A few efforts do consider both types of reliability together and use two different

  • OverCome: Coarse-Grained Instruction Commit with Handover Register Renaming
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-20
    Ipoom Jeong; Changmin Lee; Keunsoo Kim; Won Woo Ro

    Coarse-grained instruction commit mechanisms enabled the effective size of the instruction window to be as large as possible by committing a group of instructions atomically. Within a group, the reorder buffer (ROB) and physical registerfile (PRF) entries are conservatively managed, and thus the instruction window can handle more in-flight instructions beyond the hardware limit. However, previous approaches

  • A Low-Power, High-Performance Speech Recognition Accelerator
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-08-26
    Reza Yazdani; Jose-Maria Arnau; Antonio González

    Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent

  • Better Circuits for Binary Polynomial Multiplication.
    IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-04
    Magnus Gaudal Find,René Peralta

    We develop a new and simple way to describe Karatsuba-like algorithms for multiplication of polynomials over F 2 . We restrict the search of small circuits to a class of circuits we call symmetric bilinear. These are circuits in which AND gates only compute functions of the form ∑ i ∈ S a i ⋅ ∑ i ∈ S b i (S ⊆ {0,…, n - 1}). These techniques yield improved recurrences for M(kn), the number of gates

Contents have been reproduced by permission of the publishers.
全球疫情及响应:BMC Medicine专题征稿