
显示样式: 排序: IF: - GO 导出
-
Pebbles: Leveraging Sketches for Processing Voluminous, High Velocity Data Streams IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-28 Thilina Buddhika; Sangmi Lee Pallickara; Shrideep Pallickara
Voluminous, time-series data streams originating in continuous sensing environments pose data ingestion and processing challenges. We present a holistic methodology centered around data sketching to address both challenges. We introduce an order-preserving sketching algorithm that we have designed for space-efficient representation of multi-feature streams with native support for stream processing
-
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-19 Shaohuai Shi; Xiaowen Chu; Bo Li
Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this article, we observe that many
-
Burst Load Evacuation Based on Dispatching and Scheduling In Distributed Edge Networks IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-18 Shuiguang Deng; Cheng Zhang; Chang Li; Jianwei Yin; Schahram Dustdar; Albert Y. Zomaya
Edge computing, a fast evolving computing paradigm, has spawned a variety of new system architectures and computing methods discussed in both academia and industry. Edge servers are directly deployed near users’ equipment or devices owned by telecommunications companies. This allows for offloading computing tasks of various devices nearby to edge servers. Due to the shortage of computing resources
-
An Optimized Weighted Average Makespan in Fault-Tolerant Heterogeneous MPSoCs IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-20 Hassan Youness; Aly Omar; Mohamed Moness
The multiprocessor system on chips (MPSoCs) are considered today the core of most modern systems. Most of the applications of these heterogeneous MPSoCs include critical systems and hence terms of fault tolerance and reliability have become essential. Task replication is a technique to carry out fault tolerance and can help for reducing the schedule length by increasing locality. It introduces an upper
-
DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-19 Yanghua Peng; Yixin Bao; Yangrui Chen; Chuan Wu; Chen Meng; Wei Lin
Efficient resource scheduling is essential for maximal utilization of expensive deep learning (DL) clusters. Existing cluster schedulers either are agnostic to machine learning (ML) workload characteristics, or use scheduling heuristics based on operators’ understanding of particular ML framework and workload, which are less efficient or not general enough. In this article, we show that DL techniques
-
e-PoS: Making Proof-of-Stake Decentralized and Fair IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-01 Muhammad Saad; Zhan Qin; Kui Ren; DaeHun Nyang; David Mohaisen
Blockchain applications that rely on the Proof-of-Work (PoW) have increasingly become energy inefficient with a staggering carbon footprint. In contrast, energy efficient alternative consensus protocols such as Proof-of-Stake (PoS) may cause centralization and unfairness in the blockchain system. To address these challenges, we propose a modular version of PoS-based blockchain systems called e-PoS
-
True Load Balancing for Matricized Tensor Times Khatri-Rao Product IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-22 Nabil Abubaker; Seher Acer; Cevdet Aykanat
MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be
-
A GPU Acceleration Framework for Motif and Discord Based Pattern Mining IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-02-01 Biru Zhu; Youyou Jiang; Ming Gu; Yangdong Deng
With the fast digitalization of our society, mining patterns from large time series data is increasingly becoming a critical problem for a wide range of big data applications. Motif and discord discovery algorithms, which offer effective solutions to identify repeatedly appearing and abnormal patterns, respectively, are fundamental building blocks for time series processing. Both approaches, however
-
Hone: Mitigating Stragglers in Distributed Stream Processing With Tuple Scheduling IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-12 Wenxin Li; Duowen Liu; Kai Chen; Keqiu Li; Heng Qi
Low latency stream processing on large clusters consisting of hundreds to thousands of servers is an increasingly important challenge. A crucial barrier to tackling this challenge is stragglers , i.e., tasks that are significantly straggling behind others in processing the stream data. However, prior straggler mitigation solutions have significant limitations. They balance streaming workloads among
-
Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-02-01 Mochamad Asri; Dhairya Malhotra; Jiajun Wang; George Biros; Lizy K. John; Andreas Gerstlauer
In this article, we study performance and energy saving benefits of hardware acceleration under different hardware configurations and usage scenarios for a state-of-the-art Fast Multipole Method (FMM), which is a popular N-body method. We use a dedicated Application Specific Integrated Circuit (ASIC) to accelerate General Matrix-Matrix Multiply (GEMM) operations. FMM is widely used in applications
-
Guest Editorial IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-02-18 Pavan Balaji; Jidong Zhai; Min Si
The papers in this special section present the state-of-the-art technologies and the challenges of parallel and distributed computing techniques for artificial intelligence (AI), machine learning (ML), and deep learning (DL). AI, ML, and DL have established themselves in a multitude of domains because of their ability to process and model unstructured input data.
-
Learning Spatiotemporal Failure Dependencies for Resilient Edge Computing Services IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-22 Atakan Aral; Ivona Brandić
Edge computing services are exposed to infrastructural failures due to geographical dispersion, ad hoc deployment, and rudimentary support systems. Two unique characteristics of the edge computing paradigm necessitate a novel failure resilience approach. First, edge servers, contrary to cloud counterparts with reliable data center networks, are typically connected via ad hoc networks. Thus, link failures
-
Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-06 Daning Cheng; Shigang Li; Hanping Zhang; Fen Xia; Yunquan Zhang
As the training dataset size and the model size of machine learning increase rapidly, more computing resources are consumed to speedup the training process. However, the scalability and performance reproducibility of parallel machine learning training, which mainly uses stochastic optimization algorithms, are limited. In this paper, we demonstrate that the sample difference in the dataset plays a prominent
-
Efficient Methods for Mapping Neural Machine Translator on FPGAs IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-25 Qin Li; Xiaofan Zhang; Jinjun Xiong; Wen-Mei Hwu; Deming Chen
Neural machine translation (NMT) is one of the most critical applications in natural language processing (NLP) with the main idea of converting text in one language to another using deep neural networks. In recent year, we have seen continuous development of NMT by integrating more emerging technologies, such as bidirectional gated recurrent units (GRU), attention mechanisms, and beam-search algorithms
-
Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-20 Long Cheng; Ying Wang; Qingzhi Liu; Dick H.J. Epema; Cheng Liu; Ying Mao; John Murphy
Large data centers are currently the mainstream infrastructures for big data processing. As one of the most fundamental tasks in these environments, the efficient execution of distributed data operators (e.g., join and aggregation) are still challenging current data systems, and one of the key performance issues is network communication time. State-of-the-art methods trying to improve that problem
-
Partitioning-Based Scheduling of OpenMP Task Systems With Tied Tasks IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-31 Yang Wang; Xu Jiang; Nan Guan; Zhishan Guo; Xue Liu; Wang Yi
OpenMP is a popular programming framework in both general and high-performance computing and has recently drawn much interest in embedded and real-time computing. Although the execution semantics of OpenMP are similar to the DAG task model, the constraints posed by the OpenMP specification make them significantly more challenging to analyze. A tied task is an important feature in OpenMP that must execute
-
Reliability and Confidentiality Co-Verification for Parallel Applications in Distributed Systems IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-08 Guoqi Xie; Kehua Yang; Haibo Luo; Renfa Li; Shiyan Hu
Co-verification of reliability and confidentiality is a necessary process for safety- and security-critical applications. While these two objectives are conflicting, preassignment has emerged as an effective and efficient verification solution. In this article, we propose two preassignment-based co-verification techniques, namely, Blocks-based Vulnerability Preassignment (BVP) and Reversed Blocks-based
-
A Machine-Learning-Based Framework for Productive Locality Exploitation IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-13 Engin Kayraklioglu; Erwan Favry; Tarek El-Ghazawi
Data locality is of extreme importance in programming distributed-memory architectures due to its implications on latency and energy consumption. Automated compiler and runtime system optimization studies have attempted to improve data locality exploitation without burdening the programmer. However, due to the difficulty of static code analysis, conservatism in compiler optimizations to avoid errors
-
Co-Active: A Workload-Aware Collaborative Cache Management Scheme for NVMe SSDs IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-15 Hui Sun; Shangshang Dai; Jianzhong Huang; Xiao Qin
When it comes to NAND Flash-based solid-state disks (SSDs), cache can narrow the performance gap between user-level I/Os and flash memory. Cache management schemes impose relentless impacts on the endurance and performance of flash memory. A vast majority of existing cache management techniques adopt a passive data-update style (e.g., GCaR, LCR), thereby undermining response times in burst I/O requests-based
-
Reversible CSP Computations IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-14 Carlos Galindo; Naoki Nishida; Josep Silva; Salvador Tamarit
Reversibility enables a program to be executed both forwards and backwards. This ability allows programmers to backtrack the execution to a previous state. This is essential if the computation is not deterministic because re-running the program forwards may not lead to that state of interest. Reversibility of sequential programs has been well studied and a strong theoretical basis exists. Contrarily
-
A Parallel Jacobi-Embedded Gauss-Seidel Method IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-15 Afshin Ahmadi; Felice Manganiello; Amin Khademi; Melissa C. Smith
A broad range of scientific simulations involve solving large-scale computationally expensive linear systems of equations. Iterative solvers are typically preferred over direct methods when it comes to large systems due to their lower memory requirements and shorter execution times. However, selecting the appropriate iterative solver is problem-specific and dependent on the type and symmetry of the
-
A High-Throughput FPGA Accelerator for Short-Read Mapping of the Whole Human Genome IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-12 Yen-Lung Chen; Bo-Yi Chang; Chia-Hsiang Yang; Tzi-Dar Chiueh
The mapping of DNA subsequences to a known reference genome, referred to as “short-read mapping”, is essential for next-generation sequencing. Hundreds of millions of short reads need to be aligned to a tremendously long reference sequence, making short-read mapping very time consuming. In this article, a high-throughput hardware accelerator is proposed so as to accelerate this task. A Bloom filter-based
-
A Scalable Platform for Distributed Object Tracking Across a Many-Camera Network IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-05 Aakash Khochare; Aravindhan Krishnan; Yogesh Simmhan
Advances in deep neural networks (DNN) and computer vision (CV) algorithms have made it feasible to extract meaningful insights from large-scale deployments of urban cameras. Tracking an object of interest across the camera network in near real-time is a canonical problem. However, current tracking platforms have two key limitations: 1) They are monolithic, proprietary and lack the ability to rapidly
-
A Scalable Stateful Approach for Virtual Security Functions Orchestration IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-08 Niloofar Moradi; Alireza Shameli-Sendi; Alireza Khajouei
Previous works suggested different approaches to implementing service chaining. Their goal is to enhance the performance of the middleboxes and satisfy the expectations of the cloud providers and users. To meet these expectations, the delay factor, i.e., flow through the low-cost paths, as well as the best node processing factor, are considered. Achieving these two goals simultaneously turns the middlebox
-
Rings for Privacy: An Architecture for Large Scale Privacy-Preserving Data Mining IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-05 Maria Luisa Merani; Daniele Croce; Ilenia Tinnirello
This article proposes a new architecture for privacy-preserving data mining based on Multi Party Computation (MPC) and secure sums. While traditional MPC approaches rely on a small number of aggregation peers replacing a centralized trusted entity, the current study puts forth a distributed solution that involves all data sources in the aggregation process, with the help of a single server for storing
-
Distributed Adaptive Consensus Tracking Control for Multi-Agent System With Communication Constraints IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-31 Pu Zhang; Huifeng Xue; Shan Gao; Jialong Zhang
Aiming at a class of high-order strict feedback nonlinear multi-agent systems with communication constraints, a novel distributed adaptive back-stepping control method is proposed to cooperatively track the moving targets. First, five agents are used as controlled objects, and all five agents form a “leader-follower” mode with a distributed control structure. Meanwhile, the leader's moving velocity
-
On Consortium Blockchain Consistency: A Queueing Network Model Approach IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-08 Tianhui Meng; Yubin Zhao; Katinka Wolter; Cheng-Zhong Xu
Analyzing blockchain protocols is a notoriously difficult task due to the underlying large scale distributed networks. To address this problem, stochastic model-based approaches are often utilized. However, the abstract models in prior work turn out not to be adoptable to consortium blockchains as the consensus of such a blockchain often consists of multiple processes. To address the lack of efficient
-
Distributed and Dynamic Service Placement in Pervasive Edge Computing Networks IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-21 Zhaolong Ning; Peiran Dong; Xiaojie Wang; Shupeng Wang; Xiping Hu; Song Guo; Tie Qiu; Bin Hu; Ricky Y. K. Kwok
The explosive growth of mobile devices promotes the prosperity of novel mobile applications, which can be realized by service offloading with the assistance of edge computing servers. However, due to limited computation and storage capabilities of a single server, long service latency hinders the continuous development of service offloading in mobile networks. By supporting multi-server cooperation
-
E2bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-28 Weihao Cui; Quan Chen; Han Zhao; Mengze Wei; Xiaoxin Tang; Minyi Guo
We aim to tackle existing problems about deep learning serving on GPUs in the view of the system. GPUs have been widely adopted to serve online deep learning-based services that have stringent QoS(Quality-of-Service) requirements. However, emerging deep learning serving systems often result in poor responsiveness and low throughput of the inferences that damage user experience and increase the number
-
Efficient Buffer Overflow Detection on GPU IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-08 Bang Di; Jianhua Sun; Hao Chen; Dong Li
Rich thread-level parallelism of GPU has motivated co-running GPU kernels on a single GPU. However, when GPU kernels co-run, it is possible that one kernel can leverage buffer overflow to attack another kernel running on the same GPU. There is very limited work aiming to detect buffer overflow for GPU. Existing work has either large performance overhead or limited capability in detecting buffer overflow
-
PaKman: A Scalable Algorithm for Generating Genomic Contigs on Distributed Memory Machines IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-08 Priyanka Ghosh; Sriram Krishnamoorthy; Ananth Kalyanaraman
De novo genome assembly is a fundamental problem in the field of bioinformatics, that aims to assemble the DNA sequence of an unknown genome from numerous short DNA fragments (aka reads) obtained from it. With the advent of high-throughput sequencing technologies, billions of reads can be generated in a matter of hours, necessitating efficient parallelization of the assembly process. While multiple
-
Auditing Cache Data Integrity in the Edge Computing Environment IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-11 Bo Li; Qiang He; Feifei Chen; Hai Jin; Yang Xiang; Yun Yang
Edge computing allows app vendors to deploy their applications and relevant data on distributed edge servers to serve nearby users. Caching data on edge servers can minimize users’ data retrieval latency. However, such cache data are subject to both intentional and accidental corruption in the highly distributed, dynamic, and volatile edge computing environment. Given a large number of edge servers
-
A Case for Pricing Bandwidth: Sharing Datacenter Networks With Cost Dominant Fairness IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-18 Li Chen; Yuan Feng; Baochun Li; Bo Li
Unlike other resources such as CPU or memory in a virtual machine, inter-virtual-machine (inter-VM) bandwidth has not been explicitly priced in datacenter networks. In this article, we argue that tenants of an IaaS cloud computing platform should be given the flexibility to pay more for explicitly priced datacenter bandwidth beyond traditional virtual machines, in order to achieve better (or more predictable)
-
2020 Reviewers List* IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2021-01-11
Presents the list of reviewers who contributed to this publication in 2020.
-
Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-16 Teng Yu; Runxin Zhong; Vladimir Janjic; Pavlos Petoumenos; Jidong Zhai; Hugh Leather; John Thomson
Asymmetric multicore processors (AMP) offer multiple types of cores under the same programming interface. Extracting the full potential of AMPs requires intelligent scheduling decisions, matching each thread with the right kind of core, the core that will maximize performance or minimize wasted energy for this thread. Existing OS schedulers are not up to this task. While they may handle certain aspects
-
On the Effective Parallelization and Near-Optimal Deployment of Service Function Chains IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-21 Jianzhen Luo; Jun Li; Lei Jiao; Jun Cai
Network operators compose Service Function Chains (SFCs) by tying different network functions (e.g., packet inspection, flow shaping, network address translation) together and process traffic flows in the order the network functions are chained. Leveraging the technique of Network Function Virtualization (NFV), each network function can be “virtualized” and decoupled from its dedicated hardware, and
-
Design and Implementation of a Criticality- and Heterogeneity-Aware Runtime System for Task-Parallel Applications IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-23 Myeonggyun Han; Jinsu Park; Woongki Baek
Heterogeneous multiprocessing (HMP) is an emerging technology for high-performance and energy-efficient computing. While task parallelism is widely used in various computing domains, such as embedded, big-data, and machine-learning computing domains, it still remains unexplored to investigate the efficient runtime support that effectively utilizes the criticality of the tasks of the target application
-
A Scalable Multi-Layer PBFT Consensus for Blockchain IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-03 Wenyu Li; Chenglin Feng; Lei Zhang; Hao Xu; Bin Cao; Muhammad Ali Imran
Practical Byzantine Fault Tolerance (PBFT) consensus mechanism shows a great potential to break the performance bottleneck of the Proof-of-Work (PoW)-based blockchain systems, which typically support only dozens of transactions per second and require minutes to hours for transaction confirmation. However, due to frequent inter-node communications, PBFT mechanism has a poor node scalability and thus
-
Profiles of Upcoming HPC Applications and Their Impact on Reservation Strategies IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-23 Ana Gainaru; Brice Goglin; Valentin Honoré; Guillaume Pallez
With the expected convergence between HPC, BigData and AI, new applications with different profiles are coming to HPC infrastructures. We aim at better understanding the features and needs of these applications in order to be able to run them efficiently on HPC platforms. The approach followed is bottom-up: we study thoroughly an emerging application, Spatially Localized Atlas Network Tiles (SLANT
-
IPPTS: An Efficient Algorithm for Scientific Workflow Scheduling in Heterogeneous Computing Systems IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-02 Hamza Djigal; Jun Feng; Jiamin Lu; Jidong Ge
Efficient scheduling algorithms are key for attaining high performance in heterogeneous computing systems. In this article, we propose a new list scheduling algorithm for assigning task graphs to fully connected heterogeneous processors with an aim to minimize the scheduling length. The proposed algorithm, called Improved Predict Priority Task Scheduling (IPPTS) algorithm has two phases: task prioritization
-
Privacy-Preserving Similarity Search With Efficient Updates in Distributed Key-Value Stores IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-09 Wanyu Lin; Helei Cui; Baochun Li; Cong Wang
Privacy-preserving similarity search plays an essential role in data analytics, especially when very large encrypted datasets are stored in the cloud. Existing mechanisms on privacy-preserving similarity search were not able to support secure updates (addition and deletion) efficiently when frequent updates are needed. In this article, we propose a new mechanism to support parallel privacy-preserving
-
Distributed and Collective Deep Reinforcement Learning for Computation Offloading: A Practical Perspective IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-09 Xiaoyu Qiu; Weikun Zhang; Wuhui Chen; Zibin Zheng
Mobile edge computing (MEC) is a promising solution to support resource-constrained devices by offloading tasks to the edge servers. However, traditional approaches (e.g., linear programming and game-theory methods) for computation offloading mainly focus on the immediate performance, potentially leading to performance degradation in the long run. Recent breakthroughs regarding deep reinforcement learning
-
Subutai: Speeding Up Legacy Parallel Applications Through Data Synchronization IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-24 Rodrigo Cataldo; Ramon Fernandes; Kevin J. M. Martin; Jarbas Silveira; Gustavo Sanchez; Johanna Sepúlveda; César Marcon; Jean-Philippe Diguet
The decrease of the performance gain dictated by Moore's Law boosted the development of manycore architectures to replace single-core architectures. These new architectures must employ parallel applications and distribute its workload over a multitude of cores to reach the desired performance. Parallel applications are harder to develop than sequential ones since the developer must guarantee data integrity
-
Multi-Hop Multi-Task Partial Computation Offloading in Collaborative Edge Computing IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-12-03 Yuvraj Sahni; Jiannong Cao; Lei Yang; Yusheng Ji
Collaborative edge computing (CEC) is a recent popular paradigm where different edge devices collaborate by sharing data and computation resources. One of the fundamental issues in CEC is to make task offloading decision. However, it is a challenging problem to solve as tasks can be offloaded to a device at multi-hop distance leading to conflicting network flows due to limited bandwidth constraint
-
Petrel: Heterogeneity-Aware Distributed Deep Learning Via Hybrid Synchronization IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-25 Qihua Zhou; Song Guo; Zhihao Qu; Peng Li; Li Li; Minyi Guo; Kun Wang
The parameter server (PS) paradigm has achieved great success in deploying large-scale distributed Deep Learning (DL) systems. However, these systems implicitly assume that the cluster is homogeneous and this assumption does not hold in many real-world cases. Although the previous efforts are paid to address heterogeneity, they mainly prioritize the contribution of fast workers and reduce the involvement
-
Thermal Prediction for Efficient Energy Management of Clouds Using Machine Learning IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-26 Shashikant Ilager; Kotagiri Ramamohanarao; Rajkumar Buyya
Thermal management in the hyper-scale cloud data centers is a critical problem. Increased host temperature creates hotspots which significantly increases cooling cost and affects reliability. Accurate prediction of host temperature is crucial for managing the resources effectively. Temperature estimation is a non-trivial problem due to thermal variations in the data center. Existing solutions for temperature
-
Transformations of High-Level Synthesis Codes for High-Performance Computing IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-19 Johannes de Fine Licht; Maciej Besta; Simon Meierhans; Torsten Hoefler
Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target spatial
-
Analysis of Global and Local Synchronization in Parallel Computing IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-16 Franco Cicirelli; Andrea Giordano; Carlo Mastroianni
In a parallel computing scenario, the synchronization overhead, needed to coordinate the execution on the parallel computing nodes, can significantly impair the overall execution performance. Typically, synchronization is achieved by adopting a global synchronization schema involving all the nodes. In many application domains, though, a looser synchronization schema, namely, local synchronization,
-
Boosting Parallel Influence-Maximization Kernels for Undirected Networks With Fusing and Vectorization IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-16 Gökhan Göktürk; Kamer Kaya
Influence maximization (IM) is the problem of finding a seed vertex set which is expected to incur the maximum influence spread on a graph. It has various applications in practice such as devising an effective and efficient approach to disseminate information, news or ad within a social network. The problem is shown to be NP-hard and approximation algorithms with provable quality guarantees exist in
-
Coarse-Grained Parallel Routing With Recursive Partitioning for FPGAs IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-04 Minghua Shen; Guojie Luo; Nong Xiao
Routing is a very time-consuming stage in the FPGA design flow, significantly hindering the productivity. This article proposes CPRS, a c oarse-grained p arallel r outing s cheme in a distributed computing environment. First, we partition entire routing region to guide the assignment of nets for parallel processing. The partitioning is a recursive fashion, and at each recursive partitioning, the region
-
Canary: Decentralized Distributed Deep Learning Via Gradient Sketch and Partition in Multi-Interface Networks IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-09 Qihua Zhou; Kun Wang; Haodong Lu; Wenyao Xu; Yanfei Sun; Song Guo
The multi-interface networks are efficient infrastructures to deploy distributed Deep Learning (DL) tasks as the model gradients generated by each worker can be exchanged to others via different links in parallel. Although this decentralized parameter synchronization mechanism can reduce the time of gradient exchange, building a high-performance distributed DL architecture still requires the balance
-
Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-09 Rong Gu; Zhiqiang Zuo; Xi Jiang; Han Yin; Zhaokang Wang; Linzhang Wang; Xuandong Li; Yihua Huang
Static program analysis has been widely applied along the whole process of the program development for bug detection, code optimization, testing, etc. Although researchers have made significant work in static program analysis, it is still challenging to perform sophisticated interprocedural analysis on large-scale modern software. The underlying reason is that interprocedural analysis for large-scale
-
Resettable Encoded Vector Clock for Causality Analysis With an Application to Dynamic Race Detection IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-20 Tommaso Pozzetti; Ajay D. Kshemkalyani
Causality tracking among events is a fundamental challenge in distributed environments. Much previous work on this subject has focused on designing an efficient and scalable protocol to represent logical time. Several implementations of logical clocks have been proposed, most recently the Encoded Vector Clock (EVC), a protocol to encode Vector Clocks (VC) in scalar numbers through the use of prime
-
Accelerating Large-Scale Prioritized Graph Computations by Hotness Balanced Partition IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-21 Shufeng Gong; Yanfeng Zhang; Ge Yu
Prioritized computation is shown promising performance for a large class of graph algorithms. It prioritizes the execution of some vertices that play important roles in determining convergence. For large-scale distributed graph processing, graph partitioning is an important preprocessing step that aims to balance workload and to reduce communication costs between workers. However, existing graph partitioning
-
Homomorphic Sorting With Better Scalability IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-13 Gizem S. Çetin; Erkay Savaş; Berk Sunar
Homomorphic sorting is an operation that blindly sorts a given set of encrypted numbers without decrypting them (thus, there is no need for the secret key). In this article, we propose a new, efficient, and scalable method for homomorphic sorting of numbers: polynomial rank sort algorithm. To put the new algorithm in a comparative perspective, we provide an extensive survey of classical sorting algorithms
-
BOSSA: A Decentralized System for Proofs of Data Retrievability and Replication IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-12 Dian Chen; Haobo Yuan; Shengshan Hu; Qian Wang; Cong Wang
Proofs of retrievability and proofs of replication are two cryptographic tools that enable a remote server to prove that the users’ data has been correctly stored. Nevertheless, the literature either requires the users themselves to perform expensive verification jobs, or relies on a “fully trustworthy” third party auditor (TPA) to execute the public verification. In addition, none of existing solutions
-
Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-20 Zichuan Xu; Liqian Zhao; Weifa Liang; Omer F. Rana; Pan Zhou; Qiufen Xia; Wenzheng Xu; Guowei Wu
With increasing focus on Artificial Intelligence (AI) applications, Deep Neural Networks (DNNs) have been successfully used in a number of application areas. As the number of layers and neurons in DNNs increases rapidly, significant computational resources are needed to execute a learned DNN model. This ever-increasing resource demand of DNNs is currently met by large-scale data centers with state-of-the-art
-
Achieving Probabilistic Atomicity with Well-Bounded Staleness and Low Read Latency in Distributed Datastores IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-28 Lingzhi Ouyang; Yu Huang; Hengfeng Wei; Jian Lu
Although it has been commercially successful to deploy weakly consistent but highly-responsive distributed datastores, the tension between developing complex applications and obtaining only weak consistency guarantees becomes more and more severe. The almost strong consistency tradeoff aims at achieving both strong consistency and low latency in the common case. In distributed storage systems, we investigate
-
Cuttlefish: Neural Configuration Adaptation for Video Analysis in Live Augmented Reality IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-10-30 Ning Chen; Siyi Quan; Sheng Zhang; Zhuzhong Qian; Yibo Jin; Jie Wu; Wenzhong Li; Sanglu Lu
Instead of relying on remote clouds, today’s Augmented Reality (AR) applications usually send videos to nearby edge servers for analysis (such as objection detection) so as to optimize the user’s quality of experience (QoE), which is often determined by not only detection latency but also detection accuracy, playback fluency, etc. Therefore, many studies have been conducted to help adaptively choose
-
SEIZE: Runtime Inspection for Parallel Dataflow Systems IEEE Trans. Parallel Distrib. Syst. (IF 2.6) Pub Date : 2020-11-02 Youfu Li; Matteo Interlandi; Fotis Psallidas; Wei Wang; Carlo Zaniolo
Many Data-Intensive Scalable Computing (DISC) Systems provide easy-to-use functional APIs, and efficient scheduling and execution strategies allowing users to build concise data-parallel programs. In these systems, data transformations are concealed by exposed APIs, and intermediate execution states are masked under dataflow transitions. Consequently, many crucial features and optimizations (e.g.,