样式: 排序: IF: - GO 导出 标记为已读
-
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-03-13 Changmin Shin, Taehee Kwon, Jaeyong Song, Jae Hyung Ju, Frank Liu, Yeonkyu Choi, Jinho Lee
-
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-03-06 Deepanjali Mishra, Konstantinos Kanellopoulos, Ashish Panwar, Akshitha Sriraman, Vivek Seshadri, Onur Mutlu, Todd C. Mowry
-
Exploiting Direct Memory Operands in GPU Instructions IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-03-05 Ali Mohammadpur-Fard, Sina Darabi, Hajar Falahati, Negin Mahani, Hamid Sarbazi-Azad
-
Achieving Forward Progress Guarantee in Small Hardware Transactions IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-02-28 Mahita Nagabhiru, Gregory T. Byrd
-
FullPack: Full Vector Utilization for Sub-Byte Quantized Vector-Matrix Multiplication on General Purpose CPUs IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-02-27 Hossein Katebi, Navidreza Asadi, Maziar Goudarzi
-
JANM-IK: Jacobian Argumented Nelder-Mead Algorithm for Inverse Kinematics and Its Hardware Acceleration IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-02-26 Yuxin Yang, Xiaoming Chen, Yinhe Han
-
Improving Energy-efficiency of Capsule Networks on Modern GPUs IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-02-23 Mohammad Hafezan, Ehsan Atoofian
-
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-02-07 Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C. Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal
Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering
-
R.i.p. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-02-05 Lieven Eeckhout
-
Baobab Merkle Tree for Efficient Secure Memory IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-01-31 Samuel Thomas, Kidus Workneh, Ange-Thierry Ishimwe, Zack McKevitt, Phaedra Curlin, R. Iris Bahar, Joseph Izraelevitz, Tamara Lehman
Secure memory is a natural solution to hardware vulnerabilities in memory, but it faces fundamental challenges of performance and memory overheads. While significant work has gone into optimizing the protocol for performance, far less work has gone into optimizing its memory overhead. In this work, we propose the Baobab Merkle Tree , in which counters are memoized in an on-chip table. The Baobab Merkle
-
Primate: A Framework to Automatically Generate Soft Processors for Network Applications IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-01-26 Rui Ma, Jia-Ching Hsu, Ali Mansoorshahi, Joseph Garvey, Michael Kinsner, Deshanand Singh, Derek Chiou
-
Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-01-19 Asif Ali Khan, Fazal Hameed, Taha Shahroodi, Alex K. Jones, Jeronimo Castrillon
-
Direct-Coding DNA With Multilevel Parallelism IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-01-17 Caden Corontzos, Eitan Frachtenberg
The cost and time to sequence entire genomes have been on a steady and rapid decline since the early 2000s, leading to an explosion of genomic data. In contrast, the growth rates for digital storage device capacity, CPU clock speed, and networking bandwidth have been much more moderate. This gap means that the need for storing, transmitting, and processing sequenced genomic data is outpacing the capacities
-
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2024-01-17 Christodoulos Peltekis, Vasileios Titopoulos, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos
Deep Learning (DL) has achieved unprecedented success in various application domains. Meanwhile, model pruning has emerged as a viable solution to reduce the footprint of DL models in mobile applications, without compromising their accuracy. To enable the matrix engines built for dense DL models to also handle their pruned counterparts, pruned DL models follow a fine-grained structured sparsity pattern
-
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-12-11 Yang-Gon Kim, Yun-Ki Han, Jae-Kang Shin, Jun-Kyum Kim, Lee-Sup Kim
Deep Reinforcement Learning (DRL) plays a critical role in controlling future intelligent machines like robots and drones. Constantly retrained by newly arriving real-world data, DRL provides optimal autonomous control solutions for adapting to ever-changing environments. However, DRL repeats inference and training that are computationally expensive on resource-constraint mobile/embedded platforms
-
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-12-11 Courtney Golden, Dan Ilan, Caroline Huang, Niansong Zhang, Zhiru Zhang, Christopher Batten
Recent work has explored compute-in-SRAM as a promising approach to overcome the traditional processor-memory performance gap. The recently released Associative Processing Unit (APU) from GSI Technology is, to our knowledge, the first commercial compute-in-SRAM accelerator. Prior work on this platform has focused on domain-specific acceleration using direct microcode programming and/or specialized
-
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-12-06 Ramin Ayanzadeh, Moinuddin Qureshi
Analog Quantum Computers (QCs), such as D-Wave's Quantum Annealers ( QAs ) and QuEra's neutral atom platform, rival their digital counterparts in computing power. Existing QAs boast over 5,700 qubits, but their single-instruction operation model prevents using SWAP operations for making physically distant qubits adjacent. Instead, QAs use an embedding process to chain multiple physical qubits together
-
Tulip: Turn-Free Low-Power Network-on-Chip IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-12-05 Atiyeh Gheibi-Fetrat, Negar Akbarzadeh, Shaahin Hessabi, Hamid Sarbazi-Azad
The semiconductor industry has seen significant technological advancements, leading to an increase in the number of processing cores in a system-on-chip (SoC). To facilitate communication among the numerous on-chip cores, a network-on-chip (NoC) is employed. One of the main challenges of designing NoCs is power management since the NoC consumes a significant portion of the total power of the SoC. Among
-
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-11-28 Hyeseong Kim, Yunjae Lee, Minsoo Rhu
Deep neural network (DNN)-based recommendation systems (RecSys) are one of the most successfully deployed machine learning applications in commercial services for predicting ad click-through rates or rankings. While numerous prior work explored hardware and software solutions to reduce the training time of RecSys, its end-to-end training pipeline including the data preprocessing stage has received
-
Redundant Array of Independent Memory Devices IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-11-20 Peiyun Wu, Trung Le, Zhichun Zhu, Zhao Zhang
DRAM memory reliability is increasingly a concern as recent studies found. In this letter, we propose RAIMD (Redundant Array of Independent Memory Devices), an energy-efficient memory organization with RAID-like error protection. In this organization, each memory device works as an independent memory module to serve a whole memory request and to support error detection and error recovery. It relies
-
Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-11-13 Jonathan Garcia-Mallen, Shuohao Ping, Alex Miralles-Cordal, Ian Martin, Mukund Ramakrishnan, Yipeng Huang
We discuss our preliminary results in building a configurable accelerator for differential equation time stepping and iterative methods for algebraic equations. Relative to prior efforts in building hardware accelerators for numerical methods, our focus is on the following: 1) Demonstrating a higher order of numerical convergence that is needed to actually support existing numerical algorithms. 2)
-
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-11-01 João Vieira, Nuno Roma, Gabriel Falcao, Pedro Tomás
Attaining the performance and efficiency levels required by modern applications often requires the use of application-specific accelerators. However, writing synthesizable Register-Transfer Level code for such accelerators is a complex, expensive, and time-consuming process, which is cumbersome for early architecture development phases. To tackle this issue, a pre-synthesis simulation toolchain is
-
Architectural Security Regulation IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-10-31 Adam Hastings, Ryan Piersma, Simha Sethumadhavan
Across the world, governments are instituting regulations with the goal of improving the state of computer security. In this paper, we propose how security regulation can be formulated and implemented at the architectural level. Our proposal, called FAIRSHARE, requires architects to spend a pre-determined fraction of system resources (e.g., execution cycles) towards security but leaves the decision
-
A Quantum Computer Trusted Execution Environment IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-10-19 Theodoros Trochatos, Chuanqi Xu, Sanjay Deshpande, Yao Lu, Yongshan Ding, Jakub Szefer
We present the first architecture for a trusted execution environment for quantum computers. In the architecture, to protect the user's circuits, they are obfuscated with decoy control pulses added during circuit transpilation by the user. The decoy pulses are removed, i.e. attenuated, by the trusted hardware inside the superconducting quantum computer's fridge before they reach the qubits. This preliminary
-
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-10-13 Hailong Li, Jaewan Choi, Yongsuk Kwon, Jung Ho Ahn
Transformer-based models have become the backbone of numerous state-of-the-art natural language processing (NLP) tasks, including large language models. Matrix multiplication, a fundamental operation in the Transformer-based models, accounts for most of the execution time. While singular value decomposition (SVD) can accelerate this operation by reducing the amount of computation and memory footprints
-
Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-10-09 Yosuke Ueno, Yuna Tomida, Teruo Tanimoto, Masamitsu Tanaka, Yutaka Tabuchi, Koji Inoue, Hiroshi Nakamura
The bandwidth limit between cryogenic and room-temperature environments is a critical bottleneck in superconducting noisy intermediate-scale quantum computers. This paper presents the first trial of algorithm-aware system-level optimization to solve this issue by targeting the quantum approximate optimization algorithm. Our counter-based cryogenic architecture using single-flux quantum logic shows
-
NoHammer: Preventing Row Hammer With Last-Level Cache Management IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-09-29 Seunghak Lee, Ki-Dong Kang, Gyeongseo Park, Nam Sung Kim, Daehoon Kim
Row Hammer (RH) is a circuit-level phenomenon where repetitive activation of a DRAM row causes bit-flips in adjacent rows. Prior studies that rely on extra refreshes to mitigate RH vulnerability demonstrate that bit-flips can be prevented effectively. However, its implementation is challenging due to the significant performance degradation and energy overhead caused by the additional extra refresh
-
Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-09-22 Yonghae Kim, Anurag Kar, Jaewon Lee, Jaekyu Lee, Hyesoon Kim
Software attacks typically operate by overwriting control data, such as a return address and a function pointer, and hijacking the control flow of a program. To prevent such attacks, a number of control-flow integrity (CFI) solutions have been proposed. Nevertheless, most prior work finds difficulties in serving two ends: performance and security. In particular, protecting forward edges, i.e., indirect
-
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-09-25 Pau Escofet, Anabel Ovide, Carmen G. Almudever, Eduard Alarcón, Sergi Abadal
Modular quantum computing architectures offer a promising alternative to monolithic designs for overcoming the scaling limitations of current quantum computers. To achieve scalability beyond small prototypes, quantum architectures are expected to adopt a modular approach, featuring clusters of tightly connected quantum bits with sparser connections between these clusters. Efficiently distributing qubits
-
Fast Performance Prediction for Efficient Distributed DNN Training IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-09-18 Yugyoung Yun, Eunhyeok Park
Training large-scale DNN models requires parallel distributed training using hyper-scale systems. To make the best use of the numerous accelerators, it is essential to intelligently combine different parallelization schemes. However, as the size of DNN models increases, the possible combinations of schemes become enormous, and consequently, finding the optimal parallel plan becomes exceedingly expensive
-
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-09-08 Shiqing Zhang, Mahmood Naderan-Tahan, Magnus Jahre, Lieven Eeckhout
MCM-GPUs scale performance by integrating multiple chiplets within the same package. How to partition the aggregate compute resources across chiplets poses a fundamental trade-off in performance versus cost and sustainability. We propose the Performance Per Wafer (PPW) metric to explore this trade-off and we find that while performance is maximized with few large chiplets, and while cost and environmental
-
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-25 Yun-Chen Lo, Yu-Chih Tsai, Ren-Shuo Liu
Computing latency is an important system metric for Deep Neural Networks (DNNs) accelerators. To reduce latency, this work proposes LV , a latency-versatile floating-point engine (FP-PE), which contains the following key contributions: 1) an approximate bit-versatile multiplier-and-accumulate (BV-MAC) unit with early shifter and 2) an on-demand fixed-point-to-floating-point conversion (FXP2FP) unit
-
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-16 Lingfei Lu, Yudi Qiu, Shiyan Yi, Yibo Fan
Personalized recommendation system (RS) is widely used in the industrial community and occupies much time in AI computing centers. A critical component of RS is the embedding layer, which consists of sparse embedding lookups and is memory-bounded. Recent works have proposed near-memory processing (NMP) architectures to utilize high inner-memory bandwidth to speed up embedding lookups. These NMP works
-
Characterizing and Understanding Defense Methods for GNNs on GPUs IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-15 Meng Wu, Mingyu Yan, Xiaocheng Yang, Wenming Li, Zhimin Zhang, Xiaochun Ye, Dongrui Fan
Graph neural networks (GNNs) are widely deployed in many vital fields, but suffer from adversarial attacks, which seriously compromise the security in these fields. Plenty of defense methods have been proposed to mitigate the impact of these attacks, however, they have introduced extra time-consuming stages into the execution of GNNs. These extra stages need to be accelerated because the end-to-end
-
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-15 Jaewan Choi, Jaehyun Park, Kwanhee Kyung, Nam Sung Kim, Jung Ho Ahn
Transformer-based generative models, such as GPT, summarize an input sequence by generating key/value (KV) matrices through attention and generate the corresponding output sequence by utilizing these matrices once per token of the sequence. Both input and output sequences tend to get longer, which improves the understanding of contexts and conversation quality. These models are also typically batched
-
By-Software Branch Prediction in Loops IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-11 Maziar Goudarzi, Reza Azimi, Julian Humecki, Faizaan Rehman, Richard Zhang, Chirag Sethi, Tanishq Bomman, Yuqi Yang
Load-Dependent Branches (LDB) often do not exhibit regular patterns in their local or global history and thus are inherently hard to predict correctly by conventional branch predictors. We propose a software-to-hardware branch pre-resolution mechanism that allows software to pass branch outcomes to the processor frontend ahead of fetching the branch instruction. A compiler pass identifies the instruction
-
Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-10 Justin Feng, Fatemeh Arkannezhad, Christopher Ryu, Enoch Huang, Siddhant Gupta, Nader Sehatbakhsh
An important step to protect software against side-channel vulnerability is to rigorously evaluate it on the target hardware using standard leakage tests. Recently, leakage estimation tools have received a lot of attention to improve this time-consuming process. Despite their advancements, existing tools often neglect the impact of microarchitecture and its underlying events in their leakage model
-
SoCurity: A Design Approach for Enhancing SoC Security IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-08-03 Naorin Hossain, Alper Buyuktosunoglu, John-David Wellman, Pradip Bose, Margaret Martonosi
We propose SoCurity, the first NoC counter-based hardware monitoring approach for enhancing heterogeneous SoC security. With SoCurity, we develop a fast, lightweight anomalous activity detection system leveraging semi-supervised machine learning models that require no prior attack knowledge for detecting anomalies. We demonstrate our techniques with a case study on a real SoC for a connected autonomous
-
The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-07-21 Gururaj Saileshwar, Moinuddin Qureshi
This letter studies common modeling pitfalls in security analyses of hardware defenses to highlight the importance of accurate reproduction of defenses. We provide a case study of MIRAGE (Saileshwar and Qureshi 2021), a defense against cache side channel attacks, and analyze its incorrect modeling in a recent work (Chakraborty et al., 2023) that claimed to break its security. We highlight several modeling
-
Exploring the Latency Sensitivity of Cache Replacement Policies IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-07-19 Ahmed Nematallah, Chang Hyun Park, David Black-Schaffer
With DRAM latencies increasing relative to CPU speeds, the performance of caches has become more important. This has led to increasingly sophisticated replacement policies that require complex calculations to update their replacement metadata, which often require multiple cycles. To minimize the negative impact of these metadata updates, architects have focused on policies that incur as little update
-
X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-07-17 Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn
The demand for accurate information about the internal structure and characteristics of DRAM has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official documents, making it difficult to
-
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-07-03 Ipoom Jeong, Jiaqi Lou, Yongseok Son, Yongjoo Park, Yifan Yuan, Nam Sung Kim
The advancement in I/O technology has posed an unprecedented demand for high-performance processing on I/O data, leading to the development of Data Direct I/O (DDIO) technology. DDIO improves I/O processing efficiency by directly injecting all inbound I/O data into the last-level cache (LLC) in cooperation with any type of I/O device. Nonetheless, in certain scenarios with more than one I/O applications
-
Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-06-28 Kiseok Jeon, Junghee Lee, Bumsoo Kim, James J. Kim
As the value of Bitcoin increases, the difficulty level of mining keeps increasing. This is generally addressed with application-specific integrated circuits (ASIC), but block candidates are still created by the software. The overhead of block candidate generation is relatively growing because the hash computation is boosted by ASIC. Additionally, it is getting harder to find the target nonce; If it
-
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-06-28 Hwanjun Lee, Seunghak Lee, Yeji Jung, Daehoon Kim
New memory interconnect technology, such as Intel's Compute Express Link (CXL), helps to expand memory bandwidth and capacity by adding CPU-less NUMA nodes to the main memory system, addressing the growing memory wall challenge. Consequently, modern computing systems embrace the heterogeneity in memory systems, composing the memory systems with a tiered memory system with near and far memory (e.g.
-
Guard Cache: Creating Noisy Side-Channels IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-06-27 Fernando Mosquera, Krishna Kavi, Gayatri Mehta, Lizy John
Microarchitectural innovations such as deep cache hierarchies, out-of-order execution, branch prediction and speculative execution have made possible the design of processors that meet ever-increasing demands for performance. However, these innovations have inadvertently introduced vulnerabilities, which are exploited by side-channel attacks and attacks relying on speculative executions. Mitigating
-
DVFaaS: Leveraging DVFS for FaaS Workflows IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-06-20 Achilleas Tzenetopoulos, Dimosthenis Masouros, Dimitrios Soudris, Sotirios Xydis
In this letter, we propose DVFaaS , a per-core DVFS framework that utilizes control systems theory to assign just-enough frequency for the purpose of addressing the QoS requirements on serverless workflows comprising unseen functions. DVFaaS exploits the intermittent nature of serverless workflows, which enables staged control on distinguishable functions, which jointly contribute to the end-to-end
-
Toward Practical 128-Bit General Purpose Microarchitectures IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-06-20 Chandana S. Deshpande, Arthur Perais, Frédéric Pétrot
Intel introduced 5-level paging mode to support 57-bit virtual address space in 2017. This, coupled to paradigms where backup storage can be accessed through load and store instructions (e.g., non volatile memories), lets us envision a future in which a 64-bit address space has become insufficient. In that event, the straightforward solution would be to adopt a flat 128-bit address space. In this early
-
Design of a High-Performance, High-Endurance Key-Value SSD for Large-Key Workloads IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-06-02 Chanyoung Park, Chun-Yi Liu, Kyungtae Kang, Mahmut Kandemir, Wonil Choi
Current KV-SSD design assumes a specific range of typical workloads, where the size of values is quite large while that of keys is relatively small. However, we find that (i) there exist another spectrum of workloads, whose key sizes are relatively large, compared to their value sizes, and (ii) the current KV-SSD design suffers from long tail latencies and low storage utilization under such large-key
-
Towards Improved Power Management in Cloud GPUs IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-05-22 Pratyush Patel, Zibo Gong, Syeda Rizvi, Esha Choukse, Pulkit Misra, Thomas Anderson, Akshitha Sriraman
As modern server GPUs are increasingly power intensive, better power management mechanisms can significantly reduce the power consumption, capital costs, and carbon emissions in large cloud datacenters. This letter uses diverse datacenter workloads to study the power management capabilities of modern GPUs. We find that current GPU management mechanisms have limited compatibility and monitoring support
-
The Jaseci Programming Paradigm and Runtime Stack: Building Scale-Out Production Applications Easy and Fast IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-05-18 Jason Mars, Yiping Kang, Roland Daynauth, Baichuan Li, Ashish Mahendra, Krisztian Flautner, Lingjia Tang
Today's production scale-out applications include many sub-application components, such as storage backends, logging infrastructure and AI models. These components have drastically different characteristics, are required to work in collaboration, and interface with each other as microservices. This leads to increasingly high complexity in developing, optimizing, configuring, and deploying scale-out
-
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-05-16 Anurag Kar, Xueyang Liu, Yonghae Kim, Gururaj Saileshwar, Hyesoon Kim, Tushar Krishna
Recent CPU microarchitectural attacks utilize contention over the NoC to mount covert and side-channel attacks on multicore CPUs and leak information from victim applications. We propose NoIR, a dynamic LLC slice selection mechanism using slice remapping to obfuscate interconnect contention patterns. NoIR reduces contention variance by 92.18% and mean IPC degradation due to cache invalidation is limited
-
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-05-12 Samer Kurzum, Gil Shomron, Freddy Gabbay, Uri Weiser
Deep neural networks (DNNs) require abundant multiply-and-accumulate (MAC) operations. Thanks to DNNs’ ability to accommodate noise, some of the computational burden is commonly mitigated by quantization–that is, by using lower precision floating-point operations. Layer granularity is the preferred method, as it is easily mapped to commodity hardware. In this paper, we propose Dynamic Asymmetric Architecture
-
Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-05-10 Pengzhou He, Yazheng Tu, Çetin Kaya Koç, Jiafeng Xie
Large integer polynomial multiplication is frequently used as a key component in post-quantum cryptography (PQC) algorithms. Following the trend that efficient hardware implementation for PQC is emphasized, in this letter, we propose a new hardware-implemented lightweight accelerator for the large integer polynomial multiplication of Saber (one of the National Institute of Standards and Technology
-
In-Memory Versioning (IMV) IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-05-05 David Andrew Roberts, Haojie Ye, Tony Brewer, Sean Eilert
In this letter, we propose and evaluate designs for a novel hardware-assisted data versioning system (in-memory versioning or IMV) in the context of high-performance computing. Our main novelty and advantage over recent published work is that it does not require any changes to host processor logic, instead augmenting a memory controller within memory modules. It is faster and more efficient than existing
-
Kobold: Simplified Cache Coherence for Cache-Attached Accelerators IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-04-21 Jennifer Brana, Brian C. Schwedock, Yatin A. Manerkar, Nathan Beckmann
The ever-increasing cost of data movement in computer systems is driving a new era of data-centric computing. One of the most common data-centric paradigms is near-data computing (NDC), where accelerators are placed inside the memory hierarchy to avoid the costly transfer of data to the core. NDC systems show immense potential to improve performance and energy efficiency. Unfortunately, adding accelerators
-
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-04-19 Jackson Melchert, Keyi Zhang, Yuchen Mei, Mark Horowitz, Christopher Torng, Priyanka Raina
The architecture of a coarse-grained reconfigurable array (CGRA) interconnect has a significant effect on not only the flexibility of the resulting accelerator, but also its power, performance, and area. Design decisions that have complex trade-offs need to be explored to maintain efficiency and performance across a variety of evolving applications. This paper presents Canal, a Python-embedded domain-specific
-
SmartIndex: Learning to Index Caches to Improve Performance IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-04-05 Kevin Weston, Farabi Mahmud, Vahid Janfaza, Abdullah Muzahid
Modern computers rely heavily on caches to achieve higher performance. Unfortunately, a cache indexing scheme can often cause an uneven distribution of addresses across cache sets resulting in many evictions of useful cache blocks. To address this issue, we propose SmartIndex , a self-optimized indexing scheme that leverages machine learning to actively learn the memory access pattern and dynamically
-
An Intermediate Language for General Sparse Format Customization IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-03-28 Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, Zhiru Zhang
The inevitable trend of hardware specialization drives an increasing use of custom data formats in processing sparse workloads, which are typically memory-bound. These formats facilitate the automated generation of target-aware data layouts to improve memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively
-
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-03-13 Xia Zhao, Guangda Zhang, Lu Wang, Yangmei Li, Yongjun Zhang
GPU chip module count is expected to keep increasing to meet the strong scaling demands of parallel applications. In many-chip-module GPUs, memory access latency seriously limits the performance since the transferring latency between different GPU modules is very high, which cannot be easily hidden by switching between different ready threads. To handle this problem, we propose RouteReplies, which
-
Energy-Efficient Bayesian Inference Using Bitstream Computing IEEE Comput. Archit. Lett. (IF 2.3) Pub Date : 2023-02-14 Soroosh Khoram, Kyle Daruwalla, Mikko Lipasti
Uncertainty quantification is critical to many machine learning applications especially in mobile and edge computing tasks like self-driving cars, robots, and mobile devices. Bayesian Neural Networks can be used to provide these uncertainty quantifications but they come at extra computation costs. However, power and energy can be limited at the edge. In this work, we propose using stochastic bitstream