样式: 排序: IF: - GO 导出 标记为已读
-
PROV-IO$^+$: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-14 Runzhou Han, Mai Zheng, Suren Byna, Houjun Tang, Bin Dong, Dong Dai, Yong Chen, Dongkyun Kim, Joseph Hassoun, David Thorsley
-
DMA-assisted I/O for Persistent Memory IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-14 Dingding Li, Weijie Zhang, Mianxiong Dong, Kaoru Ota
-
Analytical Modeling and Throughput Computation of Blockchain Sharding IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-12 Pourya Soltani, Farid Ashtiani
-
Revisiting PM-based B$^{+}$-Tree with Persistent CPU Cache IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-05 Bowen Zhang, Shengan Zheng, Liangxu Nie, Zhenlin Qi, Hongyi Chen, Linpeng Huang, Hong Mei
-
Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-cores IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-05 Fan Yuan, Xiaojian Yang, Shengguo Li, Dezun Dong, Chun Huang, Zheng Wang
-
FHVAC: Feature-level Hybrid Video Adaptive Configuration for Machine-centric Live Streaming IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-04 Yuanhong Zhang, Weizhan Zhang, Haipeng Du, Caixia Yan, Li Liu, Qinghua Zheng
-
Critique of “Productivity, Portability, Performance Data-Centric Python” by SCC Team From Sun Yat-sen University IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-04 Han Huang, Tengyang Zheng, Tianxing Yang, Yang Ye, Siran Liu, Zhe Tang, Shengyou Lu, Guangnan Feng, Zhiguang Chen, Dan Huang
-
HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-28 Yi-Chien Lin, Bingyi Zhang, Viktor Prasanna
-
Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-22 Zhe Wang, Jia Hu, Geyong Min, Zhiwei Zhao, Zi Wang
One fundamental problem of content caching in edge computing is how to replace contents in edge servers with limited capacities to meet the dynamic requirements of users without knowing their preferences in advance. Recently, online deep reinforcement learning (DRL)-based caching methods have been developed to address this problem by learning an edge cache replacement policy using samples collected
-
Byzantine-Tolerant Causal Ordering for Unicasts, Multicasts, and Broadcasts IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-21 Anshuman Misra, Ajay D. Kshemkalyani
-
Analysis and Reproducibility of ”Productivity, Portability, Performance: Data-Centric Python” IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-21 Christopher Lompa, Piotr Luczynski
-
High Throughput Lattice-Based Signatures on GPUs: Comparing Falcon and Mitaka IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20 Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang
The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of
-
INT-label: Lightweight In-band Network-Wide Telemetry via Distributed Labeling IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20 Enge Song, Tian Pan, Haoyu Song, Qiang Fu, Yingjiang Liu, Chenhao Jia, Chuanying Yuan, Minglan Gao, Jiao Zhang, Tao Huang, Yunjie Liu
-
End-to-End Bayesian Networks Exact Learning in Shared Memory IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20 Subhadeep Karan, Zainul Abideen Sayed, Jaroslaw Zola
Bayesian networks are important Machine Learning models with many practical applications in, e.g., biomedicine and bioinformatics. The problem of Bayesian networks learning is $\mathcal {NP}$ -hard and computationally challenging. In this article, we propose practical parallel exact algorithms to learn Bayesian networks from data. Our approach uses shared-memory task parallelism to realize exploration
-
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-19 Ke Cheng, Sheng Zhang, Meizhao Liu, Yingcheng Gu, Liu Wei, Huanyu Cheng, Kai Liu, Yu Song, Xiaohang Shi, Andong Zhu, Lei Tang
Deploying microservice instances in geo-distributed edge clouds which are located at the network edge and in proximity to end-users can provide on-site processing, thereby improving the quality of service (QoS). To accommodate the time-varying request arrival rate of each edge cloud, the deployment scheme of microservice instances is dynamically adapted, which is called microservice autoscaling. However
-
Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-13 Burak Aksar, Efe Sencan, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim Brandt, Brian Kulis, Manuel Egele, Ayse K. Coskun
With the increasing scale and complexity of High-Performance Computing (HPC) systems, performance variations in applications caused by anomalies have become significant bottlenecks in system health and operational efficiency. As we move towards exascale systems, these variations become more prominent due to the increased sharing of resources. Such variations lead to lower energy efficiency and higher
-
Joint Optimization of Parallelism and Resource Configuration for Serverless Function Steps IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-13 Zhaojie Wen, Qiong Chen, Yipei Niu, Zhen Song, Quanfeng Deng, Fangming Liu
Function-as-a-Service (FaaS) offers a fine-grained resource provision model, enabling developers to build highly elastic cloud applications. User requests are handled by a series of serverless functions step by step, which forms a multi-step workflow. The developers are required to set proper configurations for functions to meet service level objectives (SLOs) and save costs. However, developing the
-
X-Shard: Optimistic Cross-Shard Transaction Processing for Sharding-Based Blockchains IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-01 Jie Xu, Yulong Ming, Zihan Wu, Cong Wang, Xiaohua Jia
Recent advances in cryptocurrencies have sparked significant interest in blockchain technology. However, scalability issues remain a major challenge for wide adoption of blockchains. Sharding is a promising approach to scale blockchains, but existing sharding-based blockchains fail to achieve expected performance gains due to limitations in cross-shard transaction processing. In this paper, we propose
-
An Offline-Transfer-Online Framework for Cloud-Edge Collaborative Distributed Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-31 Tianyu Zeng, Xiaoxi Zhang, Jingpu Duan, Chao Yu, Chuan Wu, Xu Chen
-
Multi-Agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-31 Amanda Jayanetti, Saman Halgamuge, Rajkumar Buyya
The ever-increasing demand for the cloud computing paradigm has resulted in the widespread deployment of multiple datacenters, the operations of which consume very high levels of energy. The carbon footprint resulting from these operations threatens environmental sustainability while the increased energy costs have a direct impact on the profitability of cloud providers. Using renewable energy sources
-
CloudSimPer: Simulating Geo-Distributed Datacenters Powered by Renewable Energy Mix IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23 Jie Song, Peimeng Zhu, Yanfeng Zhang, Ge Yu
Nowadays, studies on energy-efficient datacenters, especially the DataCenters powered by Renewable Energy mix (DCRE), have gained great attention. DCREs are large-scale, geo-distributed, and equipped with on-site renewable energy generators. For these features, it is expensive to perform empirical evaluations of proposed algorithms and solutions on the real-world DCREs, while the state-of-the-art datacenter
-
EvoGWP: Predicting Long-Term Changes in Cloud Workloads Using Deep Graph-Evolution Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23 Jialun Li, Jieqian Yao, Danyang Xiao, Diying Yang, Weigang Wu
Workload prediction plays a crucial role in resource management of large scale cloud datacenters. Although quite a number of methods/algorithms have been proposed, long-term changes have not been explicitly identified and considered. Due to shifty user demands, workload re-locations, or other reasons, the “resource usage pattern” of a workload, which is usually quite stable in a short-term view, may
-
Synergistically Rebalancing the EDP of Container-Based Parallel Applications IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23 Vinicius S. da Silva, Everton C. de Lima, Janaína Schwarzrock, Fábio D. Rossi, Marcelo C. Luizelli, Antonio Carlos S. Beck, Arthur F. Lorenzon
The use of containers has become standard in cloud environments. However, many parallel applications in containers will not present gains proportional to the extra available hardware. This inefficient use of hardware naturally leads to energy consumption waste. With that in mind, we propose TT-Autoscaling . It works at two different levels: a) in the container, by automatically and transparently tuning
-
Reproducing Performance of Data-Centric Python by SCC Team From National Tsing Hua University IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-18 Fu-Chiang Chang, En-Ming Huang, Pin-Yi Kuo, Chan-Yu Mou, Hsu-Tzu Ting, Pang-Ning Wu, Jerry Chou
-
Suppressing the Interference within a Datacenter: Theorems, Metric and Strategy IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-16 Yuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen, Yungang Bao
-
Collaboration in Federated Learning With Differential Privacy: A Stackelberg Game Analysis IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-16 Guangjing Huang, Qiong Wu, Peng Sun, Qian Ma, Xu Chen
As a privacy-preserving distributed learning paradigm, federated learning (FL) enables multiple client devices to train a shared model without uploading their local data. To further enhance the privacy protection performance of FL, differential privacy (DP) has been successfully incorporated into FL systems to defend against privacy attacks from adversaries. In FL with DP, how to stimulate efficient
-
Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-10 Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang
General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing. While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good performance on large and regular-shaped GEMMs, they are inadequate for optimizing small and irregular-shaped GEMMs, which are commonly seen in emerging HPC applications. Recent research has focused on improving GEMM performance
-
Estuary: A Low Cross-Shard Blockchain Sharding Protocol Based on State Splitting IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-09 Linpeng Jia, Yanxiu Liu, Keyuan Wang, Yi Sun
Sharding is one of the most promising technologies for significantly increasing blockchain transaction throughput. However, as the number of shards increases, the ratio of cross-shard transactions in existing blockchain sharding protocols gradually approaches 100%. Since cross-shard transactions consume many times more resources than intra-shard transactions, the processing overhead of cross-shard
-
EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-04 Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant
-
Real-Time Offloading for Dependent and Parallel Tasks in Cloud-Edge Environments Using Deep Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-03 Xing Chen, Shengxi Hu, Chujia Yu, Zheyi Chen, Geyong Min
As an effective technique to relieve the problem of resource constraints on mobile devices (MDs), the computation offloading utilizes powerful cloud and edge resources to process the computation-intensive tasks of mobile applications uploaded from MDs. In cloud-edge computing, the resources (e.g., cloud and edge servers) that can be accessed by mobile applications may change dynamically. Meanwhile
-
A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-12-15 Dongsheng Li, Shengwei Li, Zhiquan Lai, Yongquan Fu, Xiangyu Ye, Lei Cai, Linbo Qiao
With the increasing volumes of data samples and deep neural network (DNN) models, efficiently scaling the training of DNN models has become a significant challenge for server clusters with AI accelerators in terms of memory and computing efficiency. Existing parallelism schemes can be broadly classified into three categories: data parallelism (splitting data samples), model parallelism (splitting model
-
TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-12-06 Daoce Wang, Jesus Pulido, Pascal Grosset, Sian Jin, Jiannan Tian, Kai Zhao, James Ahrens, Dingwen Tao
Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation
-
Batch Jobs Load Balancing Scheduling in Cloud Computing Using Distributional Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-20 Tiangang Li, Shi Ying, Yishi Zhao, Jianga Shang
In cloud computing, how to reasonably allocate computing resources for batch jobs to ensure the load balance of dynamic clusters and meet user requests is an important and challenging task. Most existing studies are based on deep Q network, which utilizes neural networks to estimate the expected value of cumulative return in the scheduling process. The value-based DQN algorithms ignore the complete
-
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-20 Yaozheng Fang, Zhiyuan Zhou, Surong Dai, Jinni Yang, Hui Zhang, Ye Lu
The performance bottleneck of blockchain has shifted from consensus to serial smart contract execution in transaction validation. Previous works predominantly focus on inter-contract parallel execution, but they fail to address the inherent limitations of each smart contract execution performance. In this paper, we propose PaVM, the first smart contract virtual machine that supports both inter-contract
-
Enabling Efficient Erasure Coding in Disaggregated Memory Systems IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-15 Qiliang Li, Liangliang Xu, Yongkun Li, Min Lyu, Wei Wang, Pengfei Zuo, Yinlong Xu
Disaggregated memory (DM) separates compute and memory resources to build a huge memory pool. Erasure coding (EC) is expected to provide fault tolerance in DM with low memory cost. In DM with EC, objects are first coded in compute servers, then directly written to memory servers via high-speed networks like one-sided RDMA. However, as the one-sided RDMA latency goes down to the microsecond level, coding
-
Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-14 Zhe Jiang, Kecheng Yang, Nathan Fisher, Nan Guan, Neil C. Audsley, Zheng Dong
Following the trend of increasing autonomy in real-time systems, multi-core System-on-Chips (SoCs) have enabled devices to better handle the large streams of data and intensive computation required by such autonomous systems. In modern multi-core SoCs, each L1 cache is designed to be tied to an individual processor, and a processor can only access its own L1 cache. This design method ensures the system's
-
Enabling Streaming Analytics in Satellite Edge Computing via Timely Evaluation of Big Data Queries IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-13 Zichuan Xu, Guangyuan Xu, Hao Wang, Weifa Liang, Qiufen Xia, Shangguang Wang
Internet-of-Things (IoT) applications from many industries, such as transportation (maritime, road, rail, air) and fleet management, offshore monitoring, and farming are located in remote areas without cellular connectivity. Such IoT applications continuously generate stream data with hidden values that need to unveiled in real time. Streaming analytics is emerging as a popular type of Big Data analytics
-
Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-10 Changlong Li, Yu Liang, Liang Shi, Chao Wang, Chun Jason Xue, Xuehai Zhou
This article presents LegoSwap, a cross-device memory swapping mechanism for mobile devices. It exploits the unbalanced utilization of memory resources across devices. With LegoSwap, remote memory is utilized in a seamless plug-and-play manner. It achieves comparable-to-local swapping performance based on existing network infrastructure. In addition, LegoSwap frees from the effect of remote I/O disconnection
-
US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-09 Yunqi Gao, Bing Hu, Mahdi Boloursaz Mashhadi, A-Long Jin, Pei Xiao, Chunming Wu
The communication bottleneck severely constrains the scalability of distributed deep learning, and efficient communication scheduling accelerates distributed DNN training by overlapping computation and communication tasks. However, existing approaches based on tensor partitioning are not efficient and suffer from two challenges: 1) the fixed number of tensor blocks transferred in parallel can not necessarily
-
Demystifying the Cost of Serverless Computing: Towards a Win-Win Deal IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-07 Fangming Liu, Yipei Niu
Serverless is an emerging computing paradigm that greatly simplifies the development, deployment, and maintenance of cloud applications. However, due to potential cost issues brought by the widely adopted pricing, it is difficult to answer how to use and operate serverless computing services from the perspectives of users and providers. To demystify the cost of serverless computing, we present one
-
SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-06 Isam Mashhour Al Jawarneh, Paolo Bellavista, Antonio Corradi, Luca Foschini, Rebecca Montanari
The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes that enable insightful analytics. Applications are now relying on finding, in real-time, to which geographical regions data streaming tuples belong.
-
A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-25 Yuan Li, Ahmed Louri, Avinash Karanth
Large-scale deep neural network (DNN) accelerators are poised to facilitate the concurrent processing of diverse DNNs, imposing demanding challenges on the interconnection fabric. These challenges encompass overcoming performance degradation and energy increase associated with system scaling while also necessitating flexibility to support dynamic partitioning and adaptable organization of compute resources
-
Parallel and Distributed Bayesian Network Structure Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-23 Jian Yang, Jiantong Jiang, Zeyi Wen, Ajmal Mian
Bayesian networks (BNs) are graphical models representing uncertainty in causal discovery, and have been widely used in medical diagnosis and gene analysis due to their effectiveness and good interpretability. However, mainstream BN structure learning methods are computationally expensive, as they must perform numerous conditional independence (CI) tests to decide the existence of edges. Some researchers
-
Online Learning Algorithms for Context-Aware Video Caching in D2D Edge Networks IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-20 Qiufen Xia, Zhiwei Jiao, Zichuan Xu
With the emergence of various short video platforms such as TikTok and Instagram, coupled with the accelerated pace of people's lives, people are spending more time sharing and watching online videos than ever before, and they gradually turn their attention to short videos with short duration and novel content. Browsing and watching short videos by users with their energy-capacitated devices, such
-
Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-18 Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, Depei Qian
Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed
-
FedHAP: Federated Hashing With Global Prototypes for Cross-Silo Retrieval IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-16 Meilin Yang, Jian Xu, Wenbo Ding, Yang Liu
Deep hashing has been widely applied in large-scale data retrieval due to its superior retrieval efficiency and low storage cost. However, data are often scattered in data silos with privacy concerns, so performing centralized data storage and retrieval is not always possible. Leveraging the approach of federated learning (FL) to perform deep hashing is a recent research trend. However, existing frameworks
-
UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-09-20 Yizhi Huang, Yan Liu, Yang Bai, Si Chen, Renfa Li
Recent research has shown that collaborative computing of CPUs and GPUs in the same system can effectively accelerate large-scale SGD-based matrix factorization (MF), but it faces the problem of limited scalability due to parameter synchronization in the server. Theoretically, asynchronous methods can overcome this shortcoming. However, through a series of tests, observations, and analyses, we realize
-
Automatic Multi-Parameter Performance Modeling of HPC Applications on a New Sunway Supercomputer IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-09-19 Yilian Zhang, Yao Liu, Penglong Jiao, Yiping Zhou, Tongquan Wei
As the successor to Sunway TaihuLight, the new Sunway supercomputer has ultra-high computing capacity, but the unique heterogeneous architecture presents performance optimization challenges for High Performance Computing (HPC) applications. Performance modeling is an effective way to discover the performance bottlenecks and then improve the performance of HPC applications. Existing performance modeling
-
High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-09-12 Rong Gu, Zhihao Xu, Yang Che, Xu Wang, Haipeng Dai, Kai Zhang, Bin Fan, Haojun Hou, Li Yi, Yu Ding, Yihua Huang, Guihai Chen
Nowdays, it is prevalent to train deep learning models in cloud-native platforms that actively leverage containerization and orchestration technologies for high elasticity, low and flexible operation cost, and many other benefits. However, it also faces new challenges and our work is focusing on those related to I/O throughput for training, including complex data access, lack of matching dynamic I/O
-
Joint Deployment and Request Routing for Microservice Call Graphs in Data Centers IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-09-11 Yi Hu, Hao Wang, Liangyuan Wang, Menglan Hu, Kai Peng, Bharadwaj Veeravalli
Microservices are an architectural and organizational paradigm for Internet application development. In cloud data centers, delay-sensitive applications receive massive user requests, which are fed into multiple queues and subsequently served by multiple microservice instances. Accordingly, effective deployment of multiple queues and containers can significantly reduce queuing delay, processing delay
-
Divide&Content: A Fair OS-Level Resource Manager for Contention Balancing on NUMA Multicores IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-30 Carlos Bilbao, Juan Carlos Saez, Manuel Prieto-Matias
Chip multicore processors (CMPs) constitute the cherry-picked architecture for high-performance servers employed in supercomputers and cloud datacenters. In the last few years, Non-Uniform Memory Access (NUMA) multicore systems have become the dominant choice in these domains. Regardless of the technology advances enabling to pack an increasing number of cores and bigger caches on the same chip, contention
-
Multi-SP Network Slicing Parallel Relieving Edge Network Conflict IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-30 Rongxin Han, Dezhi Chen, Song Guo, Jingyu Wang, Qi Qi, Lu Lu, Jianxin Liao
Network slicing is rapidly prevailing in the edge network, which provides computing, network, and storage resources for various services. When the multiple service providers (SPs) respond to their tenants in parallel, individual decisions on the dynamic and shared edge network may lead to resource conflicts, which affects the delivery of network slicing services. Existing works ignore resource interaction
-
TDTA: Topology-Based Real-Time DAG Task Allocation on Identical Multiprocessor Platforms IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-30 Yulong Wu, Weizhe Zhang, Nan Guan, Yehan Ma
Modern real-time systems contain complex workloads, which are usually modeled as directed acyclic graph (DAG) tasks and deployed on multiprocessor platforms. The complex execution logic of DAG tasks results in excessive schedulability analysis overhead, and the current DAG task allocation strategy cannot efficiently utilize processor resources (inner parallelization of DAG tasks). In this article,
-
SketchINT: Empowering INT With TowerSketch for Per-Flow Per-Switch Measurement IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-28 Kaicheng Yang, Sheng Long, Qilong Shi, Yuanpeng Li, Zirui Liu, Yuhan Wu, Tong Yang, Zhengyi Jia
Network measurement is indispensable to network operations. INT solutions that can provide fine-grained per-switch per-packet information serve as promising solutions for per-flow per-switch measurement. The main shortcoming of INT is its high network overhead incurred by collecting INT information, making INT impractical for production deployment. Sketches that can compactly record per-flow information
-
Back to Homogeneous Computing: A Tightly-Coupled Neuromorphic Processor With Neuromorphic ISA IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-22 Zhijie Yang, Lei Wang, Wei Shi, Yao Wang, Junbo Tie, Feng Wang, Xiang Yu, Linghui Peng, Chao Xiao, Xun Xiao, Yao Yao, Gan Zhou, Xuhu Yu, Rui Gong, Xia Zhao, Yuhua Tang, Weixia Xu
In recent years, neuromorphic processors are widely used in many scenarios, showing extreme energy efficiency over traditional architectures. However, almost all existing neuromorphic hardware are following the heterogeneous computing methodology without Instruction Set Architecture (ISA), leading to inflexibility in programming. In this paper, we first propose a RISC-V Neuromorphic Extension (RVNE)
-
Adaptive Data Placement in Multi-Cloud Storage: A Non-Stationary Combinatorial Bandit Approach IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-17 Li Li, Jiajie Shen, Bochun Wu, Yangfan Zhou, Xin Wang, Keqin Li
Multi-cloud storage is recently a viable approach to solve the vendor lock-in, reliability, and security issues in cloud storage systems. As a key concern, data placement influences the cost and performance of storage services. Yet, in practice it remains challenging to address the huge solution space. Previous studies typically focus on constructing efficient data placement schemes based on the predicted
-
IO-Sets: Simple and Efficient Approaches for I/O Bandwidth Management IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-15 Francieli Boito, Guillaume Pallez, Luan Teylo, Nicolas Vidal
One of the main performance issues faced by high-performance computing platforms is the congestion caused by concurrent I/O from applications. When this happens, the platform's overall performance and utilization are harmed. From the extensive work in this field, I/O scheduling is the essential solution to this problem. The main drawback of current techniques is the amount of information needed about
-
The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable Edge IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-15 Junxu Xia, Geyao Cheng, Lailong Luo, Deke Guo, Pin Lv, Bowen Sun
Placing popular data at the network edge helps reduce the retrieval latency, but it also brings challenges to the limited edge storage space. Currently, using available yet not necessarily reliable edge resources is common sense for edge space expansion, while deploying deduplication storage strategies is a general method for better space utilization. However, a contradiction arises when jointly implementing
-
Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-10 Zhida Jiang, Yang Xu, Hongli Xu, Lun Wang, Chunming Qiao, Liusheng Huang
Recently, mobile and embedded devices worldwide generate a massive amount of data at the network edge. To efficiently exploit the data from distributed devices, we concentrate on decentralized machine learning (DML), where the workers collaboratively train models under the peer-to-peer (P2P) setting. DML avoids the bottleneck of the parameter server (PS) by enabling the workers to exchange local models
-
CloudSentry: Two-Stage Heavy Hitter Detection for Cloud-Scale Gateway Overload Protection IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-08-04 Jianyuan Lu, Tian Pan, Shan He, Mao Miao, Guangzhe Zhou, Yining Qi, Shize Zhang, Enge Song, Xiaoqing Sun, Huaiyi Zhao, Biao Lyu, Shunmin Zhu
The cloud vendors provide sharing resources for millions of tenants across the world to achieve economies of scale. At the same time, the cloud network keeps the performance isolation between different tenants as if they use their private dedicated resources. However, heavy hitters caused by a single tenant at cloud gateways will break such isolation, undermining the predictable performance expected