-
Guest Editors' Introduction to the Special Issue on Hardware Security IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-10-07 Amro Awad; Rujia Wang
The twelve papers in this special section focus on hardware security. This topic is becoming a significant challenge in modern computing systems. Recently discovered hardware vulnerabilities, such as Spectre and Meltdown, are striking evidence that today’s computing systems are untenable without deliberate consideration of the security aspects at the design time. The papers address various topics related
-
Enabling Secure NVM-Based in-Memory Neural Network Computing by Sparse Fast Gradient Encryption IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-08-19 Yi Cai; Xiaoming Chen; Lu Tian; Yu Wang; Huazhong Yang
Neural network (NN) computing is energy-consuming on traditional computing systems, owing to the inherent memory wall bottleneck of the von Neumann architecture and the Moore's Law being approaching the end. Non-volatile memories (NVMs) have been demonstrated as promising alternatives for constructing computing-in-memory (CIM) systems to accelerate NN computing. However, NVM-based NN computing systems
-
Understanding Selective Delay as a Method for Efficient Secure Speculative Execution IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-08-05 Christos Sakalis; Stefanos Kaxiras; Alberto Ros; Alexandra Jimborean; Magnus Själander
Since the introduction of Meltdown and Spectre, the research community has been tirelessly working on speculative side-channel attacks and on how to shield computer systems from them. To ensure that a system is protected not only from all the currently known attacks but also from future, yet to be discovered, attacks, the solutions developed need to be general in nature, covering a wide array of system
-
2.5D Root of Trust: Secure System-Level Integration of Untrusted Chiplets IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-09-01 Mohammed Nabeel; Mohammed Ashraf; Satwik Patnaik; Vassos Soteriou; Ozgur Sinanoglu; Johann Knechtel
For the first time, we leverage the 2.5D interposer technology to establish system-level security in the face of hardware- and software-centric adversaries. More specifically, we integrate chiplets (i.e., third-party hard intellectual property of complex functionality, like microprocessors) using a security-enforcing interposer. Such hardware organization provides a robust 2.5D root of trust for trustworthy
-
Instruction Sequence Identification and Disassembly Using Power Supply Side-Channel Analysis IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-08-19 Deepak Krishnankutty; Zheng Li; Ryan Robucci; Nilanjan Banerjee; Chintan Patel
Embedded systems are prone to leak information via side-channels associated with their physical internal activity, such as power consumption, timing, and faults. Leaked information can be analyzed to extract sensitive data and devices should be assessed for such vulnerabilities. Side-channel power-supply leakage from embedded devices can also provide information regarding instruction-level activity
-
MTHAEL: Cross-Architecture IoT Malware Detection Based on Neural Network Advanced Ensemble Learning IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-08-11 Danish Vasan; Mamoun Alazab; Sitalakshmi Venkatraman; Junaid Akram; Zheng Qin
The complexity, sophistication, and impact of malware evolve with industrial revolution and technology advancements. This article discusses and proposes a robust cross-architecture IoT malware threat hunting model based on advanced ensemble learning (MTHAEL). Our unique MTHAEL model using stacked ensemble of heterogeneous feature selection algorithms and state-of-the-art neural networks to learn different
-
Side-Channel Analysis and Countermeasure Design on ARM-Based Quantum-Resistant SIKE IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-08-31 Fan Zhang; Bolin Yang; Xiaofei Dong; Sylvain Guilley; Zhe Liu; Wei He; Fangguo Zhang; Kui Ren
The implementations of post-quantum cryptographic algorithms have been newly explored, whereas, the protection against side-channel attacks shall be considered upfront, since it can have a non-negligible impact on security and performance. In this article, the security of supersingular isogeny key encapsulation (SIKE), a second-round candidate of NIST's on-going post-quantum standardization process
-
Elliptic Curve Cryptography Point Multiplication Core for Hardware Security Module IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-08-05 Mohamad Ali Mehrabi; Christophe Doche; Alireza Jolfaei
In today's technology, a sheer number of Internet of Things applications use hardware security modules for secure communications. The widely used algorithms in security modules, for example, digital signatures and key agreement, are based upon elliptic curve cryptography (ECC). A core operation used in ECC is the point multiplication, which is computationally expensive for many Internet of things applications
-
SCAUL: Power Side-Channel Analysis With Unsupervised Learning IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-07-30 Keyvan Ramezanpour; Paul Ampadu; William Diehl
Existing power analysis techniques rely on strong adversary models with prior knowledge of the leakage or training data. We introduce side-channel analysis with unsupervised learning (SCAUL) that can recover the secret key without requiring prior knowledge or profiling (training). We employ an LSTM auto-encoder to extract features from power traces with high mutual information with the data-dependent
-
Built-in Security Computer: Deploying Security-First Architecture Using Active Security Processor IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-07-24 Dan Meng; Rui Hou; Gang Shi; Bibo Tu; Aimin Yu; Ziyuan Zhu; Xiaoqi Jia; Yu Wen; Yun Yang
Continually disclosed vulnerabilities reveal that traditional computer architecture lacks the consideration of security. This article proposes a security-first architecture, with an Active Security Processor (ASP) integrated to conventional computer architectures. To reduce the attack surface of ASP and improve the security of the whole system, the ASP is physically isolated from Computation Processor
-
Guest Editorial: IEEE TC Special Issue on Domain-Specific Architectures for Emerging Applications IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-07-08 Lisa Wu Wills; Karthik Swaminathan
The papers in this special section examine domain-specific architectures for emerging applications. Presents innovative research in domain-specific architectures across a broad range of emerging applications.
-
Neuromorphic System for Spatial and Temporal Information Processing IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-05 Abdullah M. Zyarah; Kevin Gomez; Dhireesha Kudithipudi
Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this article, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architecturally
-
Accelerating Deep Neural Network In-Situ Training With Non-Volatile and Volatile Memory Based Hybrid Precision Synapses IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-05 Yandong Luo; Shimeng Yu
Compute-in-memory (CIM) with emerging non-volatile memories (eNVMs) is time and energy efficient for deep neural network (DNN) inference. However, challenges still remain for DNN in-situ training with eNVMs due to the asymmetric weight update behavior, high programming latency and energy consumption. To overcome these challenges, a hybrid precision synapse combining eNVMs with capacitor has been proposed
-
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-05-29 Aayush Ankit; Izzat El Hajj; Sai Rahul Chalamalasetti; Sapan Agarwal; Matthew Marinella; Martin Foltin; John Paul Strachan; Dejan Milojicic; Wen-Mei Hwu; Kaushik Roy
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at
-
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-08 Tianqi Wang; Tong Geng; Ang Li; Xi Jin; Martin Herbordt
Deep convolutional Neural Networks (CNNs) have revolutionized numerous applications, but the demand for ever more performance remains unabated. Scaling CNN computations to larger clusters is generally done by distributing tasks in batch mode using methods such as distributed synchronous SGD. Among the issues with this approach is that, to make the distributed cluster work with high utilization, the
-
Accelerating Hyperdimensional Computing on FPGAs by Exploiting Computational Reuse IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-05-06 Sahand Salamat; Mohsen Imani; Tajana Rosing
Brain-inspired hyperdimensional (HD) computing emulates cognition by computing with long-size vectors. HD computing consists of two main modules: encoder and associative search. The encoder module maps inputs into high dimensional vectors, called hypervectors. The associative search finds the closest match between the trained model (set of hypervectors) and a query hypervector by calculating a similarity
-
Accelerating Generative Neural Networks on Unmodified Deep Learning Processors—A Software Approach IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-09 Dawen Xu; Cheng Liu; Ying Wang; Kaijie Tu; Bingsheng He; Lei Zhang
Generative neural network is a new category of neural networks and it has been widely utilized in many applications such as content generation, unsupervised learning, segmentation, and pose estimation. It typically involves massive computing-intensive deconvolution operations that cannot be fitted to conventional neural network processors directly. However, prior works mainly investigated specialized
-
PaRTAA: A Real-Time Multiprocessor for Mixed-Criticality Airborne Systems IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-16 Shibarchi Majumder; Jens Frederik Dalsgaard Nielsen; Thomas Bak
Mixed-criticality systems, where multiple systems with varying criticality-levels share a single hardware platform, require isolation between tasks with different criticality-levels. Isolation can be achieved with software-based solutions or can be enforced by a hardware level partitioning. An asymmetric multiprocessor architecture offers hardware-based isolation at the cost of underutilized hardware
-
Collaborative Accelerators for Streamlining MapReduce on Scale-up Machines With Incremental Data Aggregation IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-22 Abraham Addisie; Valeria Bertacco
The MapReduce programming paradigm has been increasingly adopted to implement data-intensive applications processing both small and large scale datasets. As most jobs in data centers have a data footprint in the order of gigabytes, emerging high-end scale-up machines are capable of running most data center processing tasks, thus significantly improving power and server density. However, this approach
-
A Lightweight Detection Algorithm For Collision-Optimized Divide-and-Conquer Attacks IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-16 Changhai Ou; Siew-Kei Lam; Chengju Zhou; Guiyuan Jiang; Fan Zhang
By introducing collision information into divide-and-conquer attacks, several existing works transform the original candidate space, which may be too large to enumerate, into a significantly smaller collision space, thus making key recovery possible. However, the inefficient collision detection algorithms and fault tolerance mechanisms make them time-consuming and their success rate low. Moreover,
-
A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-06-05 Liwei Zhou; Yang Hu; Yiorgos Makris
Beneath the potential benefits of the rapidly growing Internet of Things (IoT) technology lurk security risks. In this article, we propose a hardware-based generic framework for IoT workload forensics, an infrastructural technique to securely monitor and ensure delivered IoT services in accordance with specifications and regulatory compliance. In particular, this technique identifies digital workloads
-
OPTIMUS: A Security-Centric Dynamic Hardware Partitioning Scheme for Processors that Prevent Microarchitecture State Attacks IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-05-20 Hamza Omar; Brandon D'Agostino; Omer Khan
Hardware virtualization allows multiple security-critical and ordinary (insecure) processes to co-execute on a processor. These processes temporally share hardware resources and endure numerous security threats on the microarchitecture state. State-of-the-art secure processor architectures, such as MI6 and IRONHIDE enable capabilities to execute security-critical processes in hardware isolated enclaves
-
Distributed Training of Support Vector Machine on a Multiple-FPGA System IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-05-11 Jyotikrishna Dass; Yashwardhan Narawane; Rabi N. Mahapatra; Vivek Sarin
Support Vector Machine (SVM) is a supervised machine learning model for classification tasks. Training SVM on a large number of data samples is challenging due to the high computational cost and memory requirement. Hence, model training is supported on a high-performance server which typically runs a sequential training algorithm on centralized data. However, as we move towards massive workloads, it
-
LAWS: Locality-AWare Scheme for Automatic Speech Recognition IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-28 Reza Yazdani; Jose-Maria Arnau; Antonio González
Automatic Speech Recognition (ASR) systems are changing the way people interact with different applications on mobile devices. Fulfilling such user-interactivity requires not only a highly accurate, large-vocabulary recognition system, but also a real-time, energy-efficient solution. However, these ASR systems need high memory bandwidth and power budget, which may be impractical for most of small form-factor
-
Generalized Mixed-Criticality Static Scheduling for Periodic Directed Acyclic Graphs on Multi-Core Processors IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-27 Roberto Medina; Etienne Borde; Laurent Pautet
In safety-critical systems many software components of different criticalities or assurance levels need to interact in a timely manner to keep the system and environment safe. Nowadays, these systems are challenged by technological progress resulting in rapid increases in both software complexity and processing demands. Efficiently designing safety-critical systems subject to stringent timing requirements
-
DAG-Fluid: A Real-Time Scheduling Algorithm for DAGs IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-27 Fei Guan; Jiaqing Qiao; Yu Han
Various scheduling algorithms have been proposed for real-time parallel tasks modeled as a Directed Acyclic Graph (DAG). The capacity augmentation bound is a quantitative metric widely used in this field to compare the algorithms. Among the existing algorithms, the lowest capacity augmentation bound for DAG tasks with implicit deadlines is 2, which has been achieved by federated scheduling. To improve
-
Compiler-Assisted Data Streaming for Regular Code Structures IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-27 Nuno Neves; Pedro Tomás; Nuno Roma
The performance of modern processors is often limited by execution stalls resulting from long memory access latencies. Compile-time optimizations, deep cache hierarchies and prefetching mechanisms already provide significant performance gains, by performing memory accesses in parallel with computation. However, they are reaching a throughput improvement limit. Hence, new solutions that effectively
-
Tetris: Using Software/Hardware Co-Design to Enable Handheld, Physics-Limited 3D Plane-Wave Ultrasound Imaging IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-23 Brendan L. West; Jian Zhou; Ronald G. Dreslinksi; Oliver D. Kripfgans; J. Brian Fowlkes; Chaitali Chakrabarti; Thomas F. Wenisch
High volume acquisition rates are imperative for certain medical ultrasound imaging applications, such as 3D elastography and 3D vector flow imaging. As ultrasound imaging transitions from 2D to 3D, the massive data bandwidth and billions of trigonometric operations required to reconstruct each volume leaves conventional computer architectures falling short. Despite recent algorithmic improvements
-
Adaptive Model-Based Scheduling in Software Transactional Memory IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-11-19 Pierangelo Di Sanzo; Alessandro Pellegrini; Marco Sannicandro; Bruno Ciciani; Francesco Quaglia
Software Transactional Memory (STM) stands as powerful concurrent programming paradigm, enabling atomicity, and isolation while accessing shared data. On the downside, STM may suffer from performance degradation due to excessive conflicts among concurrent transactions, which cause waste of CPU-cycles and energy because of transaction aborts. An approach to cope with this issue consists of putting in
-
Branch Prediction Attack on Blinded Scalar Multiplication IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-09 Sarani Bhattacharya; Clémentine Maurice; Shivam Bhasin; Debdeep Mukhopadhyay
In recent years, performance counters have been used as a side channel source to monitor branch mispredictions, in order to attack cryptographic algorithms. However, the literature considers blinding techniques as effective countermeasures against such attacks. In this article, we present the first template attack on the branch predictor. We target blinded scalar multiplications with a side-channel
-
A Modeling Framework for Reliability of Erasure Codes in SSD Arrays IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-27 Mostafa Kishani; Saba Ahmadian; Hossein Asadi
Emergence of Solid-State Drives (SSDs) have evolved the data storage industry where they are rapidly replacing Hard Disk Drives (HDDs) due to their superiority in performance and power. Meanwhile, SSDs have reliability issues due to bit errors, bad blocks, and bad chips. To help reliability, Redundant Array of Independent Disks (RAID) configurations, originally proposed to increase both performance
-
CryptSQLite: SQLite With High Data Security IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-31 Yongzhi Wang; Yulong Shen; Cuicui Su; Jiawen Ma; Lingtong Liu; Xuewen Dong
SQLite, one of the most popular light-weighted database system, has been widely used in various systems. However, the compact design of SQLite did not make enough consideration on user data security. Specifically, anyone who has obtained the access to the database file will be able to read or tamper the data. Existing encryption-based solutions can only protect data on storage, while still exposing
-
Incremental Throughput Allocation of Heterogeneous Storage With No Disruptions in Dynamic Setting IEEE Trans. Comput. (IF 2.711) Pub Date : 2019-12-31 ZhiSheng Huo; Limin Xiao; Minyi Guo; Xiaoling Rong
Solid-state drives (SSDs) have been added into storage systems for improving their performance, which will bring the heterogeneity into the storage medium. The throughput is one of the essential resources in heterogeneous storage systems, and how to allocate the throughput plays a crucial role in user performance. There are many types of research on the throughput allocation of heterogeneous storage
-
Fast Encoding Algorithms for Reed–Solomon Codes With Between Four and Seven Parity Symbols IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-03 Leilei Yu; Zhichang Lin; Sian-Jheng Lin; Yunghsiang S. Han; Nenghai Yu
This article describes a fast Reed–Solomon encoding algorithm with four and seven parity symbols in between. First, we show that the syndrome of Reed–Solomon codes can be computed via the Reed–Muller transform. Based on this result, the fast encoding algorithm is then derived. Analysis shows that the proposed approach asymptotically requires 3 XORs per data bit, representing an improvement over previous
-
All-Digital Control-Theoretic Scheme to Optimize Energy Budget and Allocation in Multi-Cores IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-03 Davide Zoni; Luca Cremona; William Fornaciari
The Internet-of-Things (IoT) revolution fueled new challenges and opportunities to achieve computational efficiency goals. Embedded devices are required to execute multiple applications for which a suitable distribution of the computing power must be adapted at run-time. Such complex hardware platforms have to sustain the continuous acquisition and processing of data under severe energy budget constraints
-
Joint Management of CPU and NVDIMM for Breaking Down the Great Memory Wall IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-06 Chun-Feng Wu; Yuan-Hao Chang; Ming-Chang Yang; Tei-Wei Kuo
To provide larger memory space with lower costs, NVDIMM is a production-ready device. However, directly placing NVDIMM as the main memory would seriously degrade the system performance because of the “great memory wall” caused by the fact that in NVDIMM, the slow memory (e.g., flash memory) is several orders of magnitude slower than the fast memory (e.g., DRAM). In this article, we present a joint
-
Crossbar-Constrained Technology Mapping for ReRAM Based In-Memory Computing IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-07 Debjyoti Bhattacharjee; Yaswanth Tavva; Arvind Easwaran; Anupam Chattopadhyay
In-memory computing has gained significant attention due to the potential for dramatic improvement in speed and energy. Redox-based resistive RAMs (ReRAMs), capable of non-volatile storage and logic operations simultaneously have been used for logic-in-memory computing approaches. To this effect, we propose Re RAM based V LIW A rchitecture for in- M emory com P uting (ReVAMP), supported by a detailed
-
Automated Performance Modeling of HPC Applications Using Machine Learning IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-10 Jingwei Sun; Guangzhong Sun; Shiyan Zhan; Jiepeng Zhang; Yong Chen
Automated performance modeling and performance prediction of parallel programs are highly valuable in many use cases, such as in guiding task management and job scheduling, offering insights of application behaviors, and assisting resource requirement estimation. The performance of parallel programs is affected by numerous factors, including but not limited to hardware, applications, algorithms, and
-
A Neural Network Based Fault Management Scheme for Reliable Image Processing IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-01-10 Matteo Biasielli; Cristiana Bolchini; Luca Cassano; Erdem Koyuncu; Antonio Miele
Traditional reliability approaches introduce relevant costs to achieve unconditional correctness during data processing. However, many application environments are inherently tolerant to a certain degree of inexactness or inaccuracy. In this article, we focus on the practical scenario of image processing in space, a domain where faults are a threat, while the applications are inherently tolerant to
-
On Minimizing Internal Data Migrations of Flash Devices via Lifetime-Retention Harmonization IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-22 Ming-Chang Yang; Chun-Feng Wu; Shuo-Han Chen; Yi-Ling Lin; Che-Wei Chang; Yuan-Hao Chang
With the emerge of high-density triple-level-cell (TLC) and 3D NAND flash, the access performance and endurance of flash devices are degraded due to the downscaling of flash cells. In addition, we observe that the mismatch between data lifetime requirement and flash block retention capability could further worsen the access performance and endurance. This is because the “lifetime-retention mismatch”
-
SECRET: Semantically Enhanced Classification of Real-World Tasks IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-22 Ayten Ozge Akmandor; Jorge Ortiz; Irene Manotas; Bongjun Ko; Niraj K. Jha
Supervised machine learning (ML) algorithms are aimed at maximizing classification performance under available energy and storage constraints. They try to map the training data to the corresponding labels while ensuring generalizability to unseen data. However, they do not integrate meaning-based relationships among labels in the decision process. On the other hand, natural language processing (NLP)
-
Specification-Driven Conformance Checking for Virtual/Silicon Devices Using Mutation Testing IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-21 Haifeng Gu; Jianning Zhang; Mingsong Chen; Tongquan Wei; Li Lei; Fei Xie
Modern software systems, either system or application software, are increasingly being developed on top of virtualized software platforms. They may simply intend to execute on virtual machines or they may be expected to port to physical machines eventually. In either case, the devices, virtual or silicon, in the target virtual or physical machines are expected to conform to the specifications based
-
NOSTalgy: Near-Optimum Run-Time STT-MRAM Quality-Energy Knob Management for Approximate Computing Applications IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-21 Arash Salahvarzi; Amir Mahdi Hosseini Monazzah; Mahdi Fazeli; Kevin Skadron
The stochastic switching feature of Spin-Transfer Torque Magnetic RAM (STT-MRAM) provides an attractive knob to trade quality for energy consumption in approximate computing applications. Indeed, the quality of STT-MRAM functionalities (mainly write operation) is increased by consuming more energy to achieve a more stable write. On the other hand, in approximate computing applications, we do not need
-
A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Yu Ji; Zixin Liu; Youhui Zhang
Neural network (NN) accelerators based on resistive random access memory (ReRAM) have been widely investigated as a promising solution to address the memory wall challenge, due to its capability of processing-in-memory with extremely high density. However, the performance of these accelerators is bounded by the peripheral circuits and the interconnection. And they also suffer from accuracy issue and
-
Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Hyeonsu Lee; Hyunjun Kim; Cheolgi Kim; Hwansoo Han; Euiseong Seo
Mission-critical embedded systems simultaneously run multiple graphics-processing-unit (GPU) computing tasks with different criticality and timeliness requirements. Considerable research effort has been dedicated to supporting the preemptive priority scheduling of GPU kernels. However, hardware-supported preemption leads to lengthy scheduling delays and complicated designs, and most software approaches
-
TrackLace: Data Management for Interlaced Magnetic Recording IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Fenggang Wu; Bingzhe Li; Baoquan Zhang; Zhichao Cao; Jim Diehl; Hao Wen; David H.C. Du
Interlaced Magnetic Recording (IMR) is a promising technology which achieves higher data density and lower write amplification (WA) than Shingled Magnetic Recording (SMR). In IMR, top tracks and bottom tracks are interlaced so each bottom track is partially overlapped with two adjacent top tracks. Top tracks can be updated without any WA, but bottom track updates require reading and rewriting of affected
-
Fast and Predictable Non-Volatile Data Memory for Real-Time Embedded Systems IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Mostafa Bazzaz; Ali Hoseinghorban; Alireza Ejlali
Energy consumption and predictability are two important constraints in designing real-time embedded systems and one of the recently proposed solutions for the energy consumption problem is the use of non-volatile memories instead of conventional SRAM due to their lower leakage power consumption and smaller cell area. Furthermore, because of their non-volatile nature, the use of these memories helps
-
Real-Time Schedulability Analysis and Enhancement of Transiently Powered Processors With NVMs IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Dasom Lee; Hyeonseok Jung; Hoeseok Yang
Recent Internet-of-Things or Wireless Sensor Network devices are often operated with energy harvesters. As there are no energy storages in those devices, power is not consistently provided to the devices at all times. In such transiently powered systems, in order to keep the system reliable without losing any execution contexts, non-volatile memories (NVMs) are typically used for swift backup/restoration
-
Area-Optimized Accurate and Approximate Softcore Signed Multiplier Architectures IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Salim Ullah; Hendrik Schmidl; Siva Satyendra Sahoo; Semeen Rehman; Akash Kumar
Multiplication is one of the most extensively used arithmetic operations in a wide range of applications. In order to provide resource-efficient and high-performance multipliers, previous works have proposed different designs of accurate and approximate multipliers—mainly for ASIC-based systems. However, the architectural differences between ASICs- and FPGA-based systems limit the effectiveness of
-
Schnorr-Based Implicit Certification: Improving the Security and Efficiency of Vehicular Communications IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Paulo S. L. M. Barreto; Marcos A. Simplicio; Jefferson E. Ricardini; Harsh Kupwade Patil
In the implicit certification model, the process of verifying the validity of the signer's public key is combined with the verification of the signature itself. When compared to traditional, explicit certificates, the main advantage of the implicit approach lies in the shorter public key validation data. This property is particularly important in resource-constrained scenarios where public key validation
-
Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Ye Yu; Yingmin Li; Shuai Che; Niraj K. Jha; Weifeng Zhang
Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high computational complexity of DNNs often necessitates extremely fast and efficient hardware. The problem gets worse as the size of neural networks grows exponentially
-
HEAWS: An Accelerator for Homomorphic Encryption on the Amazon AWS FPGA IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Furkan Turan; Sujoy Sinha Roy; Ingrid Verbauwhede
Homomorphic Encryption makes privacy preserving computing possible in a third party owned cloud by enabling computation on the encrypted data of users. However, software implementations of homomorphic encryption are very slow on general purpose processors. With the emergence of ‘FPGAs as a service’, hardware-acceleration of computationally heavy workloads in the cloud are getting popular. In this article
-
DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Samet E. Arda; Anish Krishnakumar; A. Alper Goksoy; Nirmal Kumbhare; Joshua Mack; Anderson L. Sartor; Ali Akoglu; Radu Marculescu; Umit Y. Ogras
Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared to homogeneous architectures. They can be further tailored to a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this potential is contingent upon optimizing
-
WooKong: A Ubiquitous Accelerator for Recommendation Algorithms With Custom Instruction Sets on FPGA IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-20 Chao Wang; Lei Gong; Xiang Ma; Xi Li; Xuehai Zhou
Recommendation algorithms, such as Neighborhood-based Collaborative- Filtering (CF), have been widely applied in various emerging machine learning applications. However, under the circumstance of the explosive big data, it poses significant challenges to CF recommendation algorithms as it is becoming quite time and energy-consuming. It has to be optimized and accelerated by powerful engines to process
-
Predicting the Health Degree of Hard Disk Drives With Asymmetric and Ordinal Deep Neural Models IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-16 Fernando D. S. Lima; Francisco Lucas F. Pereira; Iago C. Chaves; Javam C. Machado; Joao Paulo P. Gomes
Predicting failures in Hard Disk Drives (HDD) is a major challenge that has been faced by both industry and academy in recent years. Being able to predict failure events may incur in avoiding data losses and also improve service availability. Among all failure prediction strategies, the health degree prediction is one of the most popular. The task of health degree prediction consists of, given a finite
-
Tiler: An Autonomous Region-Based Scheme for SMR Storage IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-16 Chenlin Ma; Zhaoyan Shen; Jihe Wang; Yi Wang; Renhai Chen; Yong Guan; Zili Shao
Shingled Magnetic Recording (SMR) Disks are adopted as a high-density, non-volatile media that significantly precedes conventional disks in both the storage capacity and cost. However, inefficient read-modify-writes (RMWs) greatly challenge the management of SMR disks. This article for the first time presents an approach called Tiler to manage SMR disks by dividing the physical space into small autonomous
-
Modularized Morphing of Deep Convolutional Neural Networks: A Graph Approach IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-16 Tao Wei; Changhu Wang; Chang Wen Chen
Network morphism is an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. However, existing network morphism scheme addresses only basic morphing types on the layer level. In this research, we address the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module
-
On the Analysis of Parallel Real-Time Tasks With Spin Locks IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-15 Xu Jiang; Nan Guan; He Du; Weichen Liu; Wang Yi
Locking protocol is an essential component in resource management of real-time systems, which coordinates mutually exclusive accesses to shared resources from different tasks. Although the design and analysis of locking protocols have been intensively studied for sequential real-time tasks, there has been a little work on this topic for parallel real-time tasks. In this article, we study the analysis
-
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-15 Fabian Schuiki; Florian Zaruba; Torsten Hoefler; Luca Benini
Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cycle not spent on computation, limiting ALU/FPU utilization to 33 percent on reductions. We propose “Stream Semantic Registers” to boost utilization and increase energy
-
Enforcing Predictability of Many-Cores With DCFNoC IEEE Trans. Comput. (IF 2.711) Pub Date : 2020-04-15 Tomás Picornell; José Flich; Carles Hernández; José Duato
The ever need for higher performance forces industry to include technology based on multi-processors system on chip (MPSoCs) in their safety-critical embedded systems. MPSoCs include a network-on-chip (NoC) to interconnect the cores between them and with memory and the rest of shared resources. Unfortunately, the inclusion of NoCs compromises guaranteeing time predictability as network-level conflicts
Contents have been reproduced by permission of the publishers.