• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-05-11
Jyotikrishna Dass; Yashwardhan Narawane; Rabi N. Mahapatra; Vivek Sarin

Support Vector Machine (SVM) is a supervised machine learning model for classification tasks. Training SVM on a large number of data samples is challenging due to the high computational cost and memory requirement. Hence, model training is supported on a high-performance server which typically runs a sequential training algorithm on centralized data. However, as we move towards massive workloads, it

更新日期：2020-05-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-19
Pierangelo Di Sanzo; Alessandro Pellegrini; Marco Sannicandro; Bruno Ciciani; Francesco Quaglia

Software Transactional Memory (STM) stands as powerful concurrent programming paradigm, enabling atomicity, and isolation while accessing shared data. On the downside, STM may suffer from performance degradation due to excessive conflicts among concurrent transactions, which cause waste of CPU-cycles and energy because of transaction aborts. An approach to cope with this issue consists of putting in

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-09
Sarani Bhattacharya; Clémentine Maurice; Shivam Bhasin; Debdeep Mukhopadhyay

In recent years, performance counters have been used as a side channel source to monitor branch mispredictions, in order to attack cryptographic algorithms. However, the literature considers blinding techniques as effective countermeasures against such attacks. In this article, we present the first template attack on the branch predictor. We target blinded scalar multiplications with a side-channel

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-27

Emergence of Solid-State Drives (SSDs) have evolved the data storage industry where they are rapidly replacing Hard Disk Drives (HDDs) due to their superiority in performance and power. Meanwhile, SSDs have reliability issues due to bit errors, bad blocks, and bad chips. To help reliability, Redundant Array of Independent Disks (RAID) configurations, originally proposed to increase both performance

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-31
Yongzhi Wang; Yulong Shen; Cuicui Su; Jiawen Ma; Lingtong Liu; Xuewen Dong

SQLite, one of the most popular light-weighted database system, has been widely used in various systems. However, the compact design of SQLite did not make enough consideration on user data security. Specifically, anyone who has obtained the access to the database file will be able to read or tamper the data. Existing encryption-based solutions can only protect data on storage, while still exposing

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-31
ZhiSheng Huo; Limin Xiao; Minyi Guo; Xiaoling Rong

Solid-state drives (SSDs) have been added into storage systems for improving their performance, which will bring the heterogeneity into the storage medium. The throughput is one of the essential resources in heterogeneous storage systems, and how to allocate the throughput plays a crucial role in user performance. There are many types of research on the throughput allocation of heterogeneous storage

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-03
Leilei Yu; Zhichang Lin; Sian-Jheng Lin; Yunghsiang S. Han; Nenghai Yu

This article describes a fast Reed–Solomon encoding algorithm with four and seven parity symbols in between. First, we show that the syndrome of Reed–Solomon codes can be computed via the Reed–Muller transform. Based on this result, the fast encoding algorithm is then derived. Analysis shows that the proposed approach asymptotically requires 3 XORs per data bit, representing an improvement over previous

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-03
Davide Zoni; Luca Cremona; William Fornaciari

The Internet-of-Things (IoT) revolution fueled new challenges and opportunities to achieve computational efficiency goals. Embedded devices are required to execute multiple applications for which a suitable distribution of the computing power must be adapted at run-time. Such complex hardware platforms have to sustain the continuous acquisition and processing of data under severe energy budget constraints

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-06
Chun-Feng Wu; Yuan-Hao Chang; Ming-Chang Yang; Tei-Wei Kuo

To provide larger memory space with lower costs, NVDIMM is a production-ready device. However, directly placing NVDIMM as the main memory would seriously degrade the system performance because of the “great memory wall” caused by the fact that in NVDIMM, the slow memory (e.g., flash memory) is several orders of magnitude slower than the fast memory (e.g., DRAM). In this article, we present a joint

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-07
Debjyoti Bhattacharjee; Yaswanth Tavva; Arvind Easwaran; Anupam Chattopadhyay

In-memory computing has gained significant attention due to the potential for dramatic improvement in speed and energy. Redox-based resistive RAMs (ReRAMs), capable of non-volatile storage and logic operations simultaneously have been used for logic-in-memory computing approaches. To this effect, we propose Re RAM based V LIW A rchitecture for in- M emory com P uting (ReVAMP), supported by a detailed

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-10
Jingwei Sun; Guangzhong Sun; Shiyan Zhan; Jiepeng Zhang; Yong Chen

Automated performance modeling and performance prediction of parallel programs are highly valuable in many use cases, such as in guiding task management and job scheduling, offering insights of application behaviors, and assisting resource requirement estimation. The performance of parallel programs is affected by numerous factors, including but not limited to hardware, applications, algorithms, and

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-10
Matteo Biasielli; Cristiana Bolchini; Luca Cassano; Erdem Koyuncu; Antonio Miele

Traditional reliability approaches introduce relevant costs to achieve unconditional correctness during data processing. However, many application environments are inherently tolerant to a certain degree of inexactness or inaccuracy. In this article, we focus on the practical scenario of image processing in space, a domain where faults are a threat, while the applications are inherently tolerant to

更新日期：2020-04-22
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20
Chao Wang; Lei Gong; Xiang Ma; Xi Li; Xuehai Zhou

Recommendation algorithms, such as Neighborhood-based Collaborative- Filtering (CF), have been widely applied in various emerging machine learning applications. However, under the circumstance of the explosive big data, it poses significant challenges to CF recommendation algorithms as it is becoming quite time and energy-consuming. It has to be optimized and accelerated by powerful engines to process

更新日期：2020-04-20
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-02
Byeongho Kim; Jongwook Chung; Eojin Lee; Wonkyung Jung; Sunjung Lee; Jaewan Choi; Jaehyun Park; Minbok Wi; Sukhan Lee; Jung Ho Ahn

Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance and energy efficiency of MV-mul is determined by those of main-memory DRAM. Therefore, computing MV-mul

更新日期：2020-04-02
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-02
Qiang Liu; Shuzhen Qin; Bo Yu; Jie Tang; Shaoshan Liu

Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, street view map generation, and even space exploration etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large

更新日期：2020-04-02
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-20
Yongwei Zhao; Zhe Fan; Zidong Du; Tian Zhi; Ling Li; Qi Guo; Shaoli Liu; Zhiwei Xu; Tianshi Chen; Yunji Chen

Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of programming productivity. However, with the fast development

更新日期：2020-03-20
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-18
Yijin Guan; Guangyu Sun; Zhihang Yuan; Xingchen Li; Ningyi Xu; Shu Chen; Jason Cong; Yuan Xie

Convolutional neural networks (CNNs) have achieved great success in numerous AI applications. To improve inference efficiency of CNNs, researchers have proposed various pruning techniques to reduce both computation intensity and storage overhead. These pruning techniques result in multi-level sparsity irregularities in CNNs. Together with that in activation matrices, which is induced by employment

更新日期：2020-03-18
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-11
Ahmed Louri

Presents the introductory editorial for this issue of the publication.

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-15
Elizabeth Adams; Suganthi Venkatachalam; Seok-Bum Ko

Approximate computing can be used in error-resilient applications to reduce power consumption and increase overall circuit performance. This article introduces two approximate dividers with restoring array-based architecture that achieve substantial hardware savings while maintaining high accuracy when compared to existing approximate designs. The first design replaces exact restoring divider cells

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-18
Qiao Li; Liang Shi; Yufei Cui; Chun Jason Xue

By stacking layers vertically, the adoption of 3D NAND has significantly increased the capacity for storage systems. The complex structure of 3D NAND introduces more errors than planer flash. To address the reliability issue, low-density parity-check (LDPC) code with a strong error correction capability is now widely applied on 3D NAND flash memory. However, LDPC has long decoding latency when the

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-31
Anastasia Volkova; Thibault Hilaire; Christoph Lauter

In this paper we target the Fixed-Point (FxP) implementation of Linear Time-Invariant (LTI) filters evaluated with state-space equations. We assume that wordlengths are fixed and that our goal is to determine binary point positions that guarantee the absence of overflows while maximizing accuracy. We provide a model for the worst-case error analysis of FxP filters that gives tight bounds on the output

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-26
Marc Fyrbiak; Sebastian Wallat; Sascha Reinhard; Nicolai Bissantz; Christof Paar

Hardware reverse engineering is a powerful and universal tool for both security engineers and adversaries. From a defensive perspective, it allows for detection of intellectual property infringements and hardware Trojans, while it simultaneously can be used for product piracy and malicious circuit manipulations. From a designer's perspective, it is crucial to have an estimate of the costs associated

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-09
Neng Zhang; Qiao Qin; Hang Yuan; Chenggao Zhou; Shouyi Yin; ShaoJun Wei; Leibo Liu

Large integer multiplication, or large degree polynomial multiplication, is the most time-consuming operation in fully homomorphic encryption (FHE). Low area and power consumption are difficult to maintain while achieving high performance for a large size multiplier. To address this issue, an area-efficient low-power architecture for multiplication, named NTTU, is proposed in this article. First, a

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-04
Rei Ueno; Sumio Morioka; Noriyuki Miura; Kohei Matsuda; Makoto Nagata; Shivam Bhasin; Yves Mathieu; Tarik Graba; Jean-Luc Danger; Naofumi Homma

This article proposes highly efficient Advanced Encryption Standard (AES) hardware architectures that support encryption and both encryption and decryption. New operation-reordering and register-retiming techniques presented in this article allow us to unify the inversion circuits in SubBytes and InvSubBytes without any delay overhead. In addition, a new optimization technique for minimizing linear

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-20
David Kuang-Hui Yu; Jen-Wei Hsieh

As flash memory technology has been scaled down to 1x nm and more bits can be stored in a cell, the storage density of flash memory has been significantly improved. However, these technical trends also severely hurt the programming speed and endurance of flash memory. The internal data retention time is the duration for which a flash cell can correctly hold data. By relaxing internal data retention

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-28
Shuang Wang; Xiaoping Li; Rubén Ruiz

In this article, we consider the problem of selecting appropriate heterogeneous servers in cloud centers for stochastically arriving requests in order to obtain an optimal tradeoff between the expected response time and power consumption. Heterogeneous servers with uncertain setup times are far more common than homogenous ones. The heterogeneity of servers and stochastic requests pose great challenges

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-18
Xiyue Xiang; Purushottam Sigdel; Nian-Feng Tzeng

A bufferless network-on-chip (NoC) can deliver high energy efficiency, but such a NoC is subject to growing deflection when its traffic load rises. This article proposes Deflection Containment (DeC) for the bufferless NoC to address its notorious shortcomings of excessive deflection for performance improvement and energy savings. With multiple subnetworks bridged by an added link between two corresponding

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-19
Jiang Zhou; Yong Chen; Wei Xie; Dong Dai; Shuibing He; Weiping Wang

Data replication is a key technique to achieve high data availability, reliability, and optimized performance in distributed storage systems. In recent years, with emerged new storage devices, heterogeneous object-based storage systems, such as a storage system with a mix of hard disk drives, solid state drives, and other non-volatile memory devices have become increasingly attractive since they combine

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-18
Nicola Bombieri; Federico Busato; Alessandro Danese; Luca Piccolboni; Graziano Pravadelli

Likely invariants model properties that hold in operating conditions of a computing system. Dynamic mining of invariants aims at extracting logic formulas representing such properties from the system execution traces, and it is widely used for verification of intellectual property (IP) blocks. Although the extracted formulas represent likely invariants that hold in the considered traces, there is no

更新日期：2020-03-16
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-13
Hongwu Jiang; Xiaochen Peng; Shanshi Huang; Shimeng Yu

Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix

更新日期：2020-03-13
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-03-05
Xi Zeng; Tian Zhi; Xuda Zhou; Zidong Du; Qi Guo; Shaoli Liu; Bingrui Wang; Yuanbo Wen; Chao Wang; Xuehai Zhou; Ling Li; Tianshi Chen; Ninghui Sun; Yunji Chen

Neural networks have become the dominant algorithms rapidly as they achieve state-of-the-art performance in a broad range of applications such as image recognition, speech recognition, and natural language processing. However, neural networks keep moving toward deeper and larger architectures, posing a great challenge to hardware systems due to the huge amount of data and computations. Although sparsity

更新日期：2020-03-05
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-02-21
Liancheng Jia; Yun Liang; Xiuhong Li; Liqiang Lu; Shengen Yan

Modern Convolutional Neural Networks (CNNs) require a massive amount of convolution operations. To address the overwhelming computation problem, Winograd and FFT fast algorithms have been used as effective approaches to reduce the number of multiplications. Inputs and filters are transformed into special domains then perform element-wise multiplication, which can be transformed into batched GEMM operation

更新日期：2020-02-21
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-02-17
Mineto Tsukada; Masaaki Kondo; Hiroki Matsutani

Semi-supervised anomaly detection is an approach to identify anomalies by learning the distribution of normal data. Backpropagation neural networks (i.e., BP-NNs) based approaches have recently drawn attention because of their good generalization capability. In a typical situation, BP-NN-based models are iteratively optimized in server machines with input data gathered from the edge devices. However

更新日期：2020-02-17
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-07
Nader Sehatbakhsh; Alireza Nazari; Monjur Alam; Frank Werner; Yuanda Zhu; Alenka Zajic; Milos Prvulovic

Cyber-physical systems (CPS) are controlling many critical and sensitive aspects of our physical world while being continuously exposed to potential cyber-attacks. These systems typically have limited performance, memory, and energy reserves, which limits their ability to run existing advanced malware protection, and that, in turn, makes securing them very challenging. To tackle these problems, this

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-21
Jingwei Hu; Marco Baldi; Paolo Santini; Neng Zeng; San Ling; Huaxiong Wang

In this paper, we present a lightweight hardware design for a recently proposed quantum-safe key encapsulation mechanism based on QC-LDPC codes called LEDAkem, which has been admitted as a round-2 candidate to the NIST post-quantum standardization project. Existing implementations focus on high speed while few of them take into account area or power efficiency, which are particularly decisive for low-cost

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-21
Leonel Sousa; Rogério Paludo; Paulo Martins; Hector Pettenghi

The conversion from a Residue Number System (RNS) to a weighted representation is a costly inter-modulo operation that introduces delay and area overhead to RNS processors, while also increasing power consumption. This paper proposes a new approach to decompose the reverse conversion into operations that can be processed by the arithmetic units already present in the RNS independent channels. This

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-23
Arman Roohi; Shadi Sheikhfaal; Shaahin Angizi; Deliang Fan; Ronald F DeMara

A Generative Adversarial Network (GAN) is an adversarial learning approach which empowers conventional deep learning methods by alleviating the demands of massive labeled datasets. However, GAN training can be computationally-intensive limiting its feasibility in resource-limited edge devices. In this paper, we propose an approximate GAN (ApGAN) for accelerating GANs from both algorithm and hardware

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-23

By injecting faults, active physical attacks pose serious threats to cryptographic hardware where Concurrent Error Detection (CED) schemes are promising countermeasures. They are usually based on an Error-Detecting Code (EDC) which enables detecting certain injected faults depending on the specification of the underlying code. Here, we propose a methodology to enable correct, practical, and robust

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-25
Myeonggyun Han; Jihoon Hyun; Seongbeom Park; Woongki Baek

Heterogeneous memory systems that comprise memory nodes with disparate architectural characteristics (e.g., DRAM and high-bandwidth memory (HBM)) have surfaced as a promising solution in a variety of computing domains ranging from embedded to high-performance computing. Since deep learning (DL) is one of the most widely-used workloads in various computing domains, it is crucial to explore efficient

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-25
Alejandro Morán; Christiam F. Frasser; Miquel Roca; Josep L. Rosselló

The development of power-efficient Machine Learning Hardware is of high importance to provide Artificial Intelligence (AI) characteristics to those devices operating at the Edge. Unfortunately, state-of-the-art data-driven AI techniques such as deep learning are too costly in terms of hardware and energy requirements for Edge Computing (EC) devices. Recently, Cellular Automata (CA) have been proposed

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-28
Michael Lunglmayr; Daniel Wiesinger; Werner Haselmayr

In stochastic computing (SC), a real-valued number is represented by a stochastic bit stream, encoding its value in the probability of obtaining a one. This leads to a significantly lower hardware effort for various functions and provides a higher tolerance to errors (e.g., bit flips) compared to binary radix representation. The implementation of a stochastic max/min function is important for many

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-28
Yuan Yao; Zhonghai Lu

In sharp contrast to conventional performance indicative based Network-on-Chip (NoC) DVFS, where the direct relation between application performance and NoC power consumption is missing, we exploit the concept of Performance-Power Characteristic Curve (PPCC) newly proposed in the literature to approach maximum NoC power efficiency. PPCC, which defines the direct relation between application performance

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-31
Levent Aksoy; Mustafa Altun

Two-dimensional switching lattices including four-terminal switches are introduced as alternative structures to realize logic functions, aiming to outperform the designs consisting of one-dimensional two-terminal switches. Exact and approximate algorithms have been proposed for the problem of finding a switching lattice which implements a given logic function and has the minimum size, i.e., a minimum

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-20
Xiaoliang Dai; Hongxu Yin; Niraj K. Jha

Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-29

The branch predictor unit (BPU) is among the main energy consuming components in out-of-order (OoO) processors. For integer applications, we find 16 percent of the processor energy is consumed by the BPU. BPU is accessed in parallel with the instruction cache before it is known if a fetch group contains control instructions. We find 85 percent of BPU lookups are done for non-branch operations, and

更新日期：2020-02-11
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-02-10
Souvik Kundu; Mahdi Nazemi; Massoud Pedram; Keith M. Chugg; Peter A. Beerel

The high energy cost of processing deep convolutional neural networks impedes their ubiquitous deployment in energy-constrained platforms such as embedded systems and IoT devices. This article introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters. Due to the efficient storage of our periodic sparse kernels, the parameter

更新日期：2020-02-10
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-31
Rui Han; Chi Harold Liu; Shilin Li; Shilin Wen; Xue Liu

Modern distributed engines are increasingly deployed to accelerate large-scaled deep learning (DL) training jobs. While the parallelism of distributed workers/nodes promises the scalability, the computation and communication overheads of the underlying iterative solving algorithms, e.g., stochastic gradient decent, unfortunately become the bottleneck for distributed DL training jobs. Existing approaches

更新日期：2020-01-31
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-30
Çetin Kaya Koç

This article describes and analyzes all existing algorithms for computing $x=a^{-1}\pmod {p^k}$ for a prime $p$ , and also introduces a new algorithm based on the exact solution of linear equations using $p$ -adic expansions. The algorithm starts with the initial value $c=a^{-1}\pmod {p}$ and iteratively computes the digits of the inverse $x=a^{-1}\pmod {p^k}$ in base $p$ . The mod 2 version of the

更新日期：2020-01-30
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-30
Chen Zhang; Qiang Cao; Hong Jiang; Wenhui Zhang; Jingjun Li; Jie Yao

Surveillance cameras are ubiquitous around us. Emerging full-feature object-detection models can analyze surveillance videos with high accuracy but consume much computation. Directly applying these models for practical scenarios with large-scale cameras is prohibitively expensive. This, however, is wasteful and unnecessary considering that user-defined anomalies occur rarely among these videos. Therefore

更新日期：2020-01-30
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-28
Young Geun Kim; Minyong Kim; Joonho Kong; Sung Woo Chung

Off-the-shelf embedded systems have adopted heterogeneous multi-core processors which have high-performance big cores and low-power small cores. Though there are two different types of cores in heterogeneous multi-core processors, conventional DVFS (Dynamic Voltage and Frequency Scaling)-based DTM (Dynamic Thermal Management) techniques do not utilize the different types of cores to cool down hot cores

更新日期：2020-01-28
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-23
Lorenzo Servadei; Edoardo Mosca; Elena Zennaro; Keerthikumara Devarajegowda; Michael Werner; Wolfgang Ecker; Robert Wille

Hardware/software co-designs are usually defined at high levels of abstractions at the beginning of the design process in order to provide a variety of options on how to realize a system. This allows for design exploration which relies on knowing the costs of different design configurations (with respect to hardware usage and firmware metrics). To this end, methods for cost estimation are frequently

更新日期：2020-01-23
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-23

Making approximate computing specific to user requirements is crucial to system performance, energy-efficiency, and reliability. However, developing hardware for such optimization becomes a significant challenge due to the high cost of examining all potential choices while exploring a large design space. One determinant aspect of exploring a design space is the efficiency of evaluating error metrics

更新日期：2020-01-23
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-23
Yufeng Zhan; Song Guo; Peng Li; Jiang Zhang

Edge computing is a new paradigm to provide strong computing capability at the edge of pervasive radio access networks close to users. A critical research challenge of edge computing is to design an efficient offloading strategy to decide which tasks can be offloaded to edge servers with limited resources. Although many research efforts attempt to address this challenge, they need centralized control

更新日期：2020-01-23
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
Jiwoong Park; Heonyoung Yeom; Yongseok Son

Most modern multi-core processors provide a shared last level cache (LLC) where data from all cores are placed to improve performance. However, this opens a new challenge for cache management, owing to cache pollution. With cache pollution, data with weak temporal locality can evict other data with strong temporal locality when both are mapped into the same cache set. In this article, we propose page

更新日期：2020-01-21
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
Zhifang Li; Beicheng Peng; Chuliang Weng

Nowadays, GPUs have achieved high throughput computing by running plenty of threads. However, owing to disjoint memory spaces of discrete CPU-GPU systems, exploiting CPU and GPU within a data processing pipeline is a non-trivial issue, which can only be resolved by the coarse-grained workflow of “copy-kernel-copy” or its variants in essence. There is an underlying bottleneck caused by frequent inter-processor

更新日期：2020-01-21
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
Ming-Chang Yang; Yuan-Hao Chang; Tei-Wei Kuo; Chun-Feng Wu

Performance-intensive applications have led both interface and architecture changes of high-end, growing-scale solid-state drives (SSDs). However, we observe that most of the time, the actual drive performance could not be easily scaled or boosted up with the increasing of internal resources of growing-scale SSDs due to the potential congestion of I/O requests. Such observation inspires this article

更新日期：2020-01-21
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-21
Wenqi Cao; Ling Liu

This article presents XMemPod , a hierarchical disaggregated memory orchestration system. XMemPod virtualizes cluster wide memory to scale large memory workloads in virtualized clouds. It makes three novel contributions: (1) XMemPod offers efficient, transparent, and dynamic sharing of available memory that is disaggregated across VMs on the same host or in the cluster. (2) XMemPod provides a hierarchical

更新日期：2020-01-21
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-09-27
Minsik Oh; Kwangsu Kim; Duheon Choi; Hyuk-Jun Lee; Eui-Young Chung

Recently, a hybrid cache consisting of SRAM and STT-RAM has attracted much attention as a future memory by complementing each other with different memory characteristics. Prior works focused on developing data allocation and migration techniques considering write-intensity to reduce write energy at STT-RAM. However, these works often neglect the impact of operation-specific reusability of a cache line

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-04
Shinobu Miwa; Masaya Ishihara; Hayato Yamaki; Hiroki Honda; Martin Schulz

Power-efficiency has become one of the most critical concerns for HPC as we continue to scale computational capabilities. A significant fraction of system power is spent on large main memories, mainly caused by the substantial amount of DIMM standby power needed. However, while necessary for some workloads, for many workloads large memory configurations are too rich, i.e., these workloads only make

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-10-04
Amit Kumar Singh; Karunakar Reddy Basireddy; Alok Prakash; Geoff V. Merrett; Bashir M. Al-Hashimi

Heterogeneous Mobile System-on-Chips (SoCs) containing CPU and GPU cores are becoming prevalent in embedded computing, and they need to execute applications concurrently. However, existing run-time management approaches do not perform adaptive mapping and thread-partitioning of applications while exploiting both CPU and GPU cores at the same time. In this paper, we propose an adaptive mapping and thread-partitioning

更新日期：2020-01-17
Contents have been reproduced by permission of the publishers.

down
wechat
bug