样式: 排序: IF: - GO 导出 标记为已读
-
Benchmarking Distributed Coordination Systems: A Survey and Analysis arXiv.cs.DC Pub Date : 2024-03-14 Bekir Turkkan, Tevfik Kosar, Aleksey Charapko, Ailidani Ailijiang, Murat Demirbas
Coordination services and protocols are critical components of distributed systems and are essential for providing consistency, fault tolerance, and scalability. However, due to lack of a standard benchmarking tool for distributed coordination services, coordination service developers/researchers either use a NoSQL standard benchmark and omit evaluating consistency, distribution, and fault-tolerance;
-
Error-Free Near-Optimal Validated Agreement arXiv.cs.DC Pub Date : 2024-03-13 Pierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Igor Zablotchi
Byzantine agreement enables n processes to agree on a common L-bit value, despite t > 0 arbitrary failures. A long line of work has been dedicated to improving the worst-case bit complexity of Byzantine agreement in synchrony. This has culminated in COOL, an error-free (deterministically secure against a computationally unbounded adversary) algorithm that achieves a near-optimal bit complexity of O(nL
-
Efficient Fault Tolerance for Pipelined Query Engines via Write-ahead Lineage arXiv.cs.DC Pub Date : 2024-03-12 Ziheng Wang, Alex Aiken
Modern distributed pipelined query engines either do not support intra-query fault tolerance or employ high-overhead approaches such as persisting intermediate outputs or checkpointing state. In this work, we present write-ahead lineage, a novel fault recovery technique that combines Spark's lineage-based replay and write-ahead logging. Unlike Spark, where the lineage is determined before query execution
-
SCALHEALTH: Scalable Blockchain Integration for Secure IoT Healthcare Systems arXiv.cs.DC Pub Date : 2024-03-12 Mehrzad Mohammadi, Reza Javan, Mohammad Beheshti-Atashgah, Mohammad Reza Aref
Internet of Things (IoT) devices are capable of allowing for far-reaching access to and evaluation of patient data to monitor health and diagnose from a distance. An electronic healthcare system that checks patient data, prepares medicines and provides financial assistance is necessary. Providing safe data transmission, monitoring, decentralization, preserving patient privacy, and maintaining confidentiality
-
Accelerating Biclique Counting on GPU arXiv.cs.DC Pub Date : 2024-03-12 Linshan Qiu, Zhonggen Li, Xiangyu Ke, Lu Chen, Yunjun Gao
Counting (p,q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest subgraph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading (p,q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's
-
MPCPA: Multi-Center Privacy Computing with Predictions Aggregation based on Denoising Diffusion Probabilistic Model arXiv.cs.DC Pub Date : 2024-03-12 Guibo Luo, Hanwen Zhang, Xiuling Wang, Mingzhi Chen, Yuesheng Zhu
Privacy-preserving computing is crucial for multi-center machine learning in many applications such as healthcare and finance. In this paper a Multi-center Privacy Computing framework with Predictions Aggregation (MPCPA) based on denoising diffusion probabilistic model (DDPM) is proposed, in which conditional diffusion model training, DDPM data generation, a classifier, and strategy of prediction aggregation
-
Measuring Data Similarity for Efficient Federated Learning: A Feasibility Study arXiv.cs.DC Pub Date : 2024-03-12 Fernanda Famá, Charalampos Kalalas, Sandra Lagen, Paolo Dini
In multiple federated learning schemes, a random subset of clients sends in each round their model updates to the server for aggregation. Although this client selection strategy aims to reduce communication overhead, it remains energy and computationally inefficient, especially when considering resource-constrained devices as clients. This is because conventional random client selection overlooks the
-
Polylog-Competitive Deterministic Local Routing and Scheduling arXiv.cs.DC Pub Date : 2024-03-12 Bernhard Haeupler, Shyamal Patel, Antti Roeyskoe, Cliff Stein, Goran Zuzic
This paper addresses point-to-point packet routing in undirected networks, which is the most important communication primitive in most networks. The main result proves the existence of routing tables that guarantee a polylog-competitive completion-time $\textbf{deterministically}$: in any undirected network, it is possible to give each node simple stateless deterministic local forwarding rules, such
-
Comparing Task Graph Scheduling Algorithms: An Adversarial Approach arXiv.cs.DC Pub Date : 2024-03-11 Jared Coleman, Bhaskar Krishnamachari
Scheduling a task graph representing an application over a heterogeneous network of computers is a fundamental problem in distributed computing. It is known to be not only NP-hard but also not polynomial-time approximable within a constant factor. As a result, many heuristic algorithms have been proposed over the past few decades. Yet it remains largely unclear how these algorithms compare to each
-
Parameterized Task Graph Scheduling Algorithm for Comparing Algorithmic Components arXiv.cs.DC Pub Date : 2024-03-11 Jared Coleman, Ravi Vivek Agrawal, Ebrahim Hirani, Bhaskar Krishnamachari
Scheduling distributed applications modeled as directed, acyclic task graphs to run on heterogeneous compute networks is a fundamental (NP-Hard) problem in distributed computing for which many heuristic algorithms have been proposed over the past decades. Many of these algorithms fall under the list-scheduling paradigm, whereby the algorithm first computes priorities for the tasks and then schedules
-
Atomicity and Abstraction for Cross-Blockchain Interactions arXiv.cs.DC Pub Date : 2024-03-12 Huaixi Lu, Akshay Jajoo, Kedar S. Namjoshi
A blockchain facilitates secure and atomic transactions between mutually untrusting parties on that chain. Today, there are multiple blockchains with differing interfaces and security properties. Programming in this multi-blockchain world is hindered by the lack of general and convenient abstractions for cross-chain communication and computation. Current cross-chain communication bridges have varied
-
Optimizing sDTW for AMD GPUs arXiv.cs.DC Pub Date : 2024-03-11 Daniel Latta-Lin, Sofia Isadora Padilla Munoz
Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors
-
Dynamic Client Clustering, Bandwidth Allocation, and Workload Optimization for Semi-synchronous Federated Learning arXiv.cs.DC Pub Date : 2024-03-11 Liangkun Yu, Xiang Sun, Rana Albelaihi, Chaeeun Park, Sihua Shao
Federated Learning (FL) revolutionizes collaborative machine learning among Internet of Things (IoT) devices by enabling them to train models collectively while preserving data privacy. FL algorithms fall into two primary categories: synchronous and asynchronous. While synchronous FL efficiently handles straggler devices, it can compromise convergence speed and model accuracy. In contrast, asynchronous
-
Data Poisoning Attacks in Gossip Learning arXiv.cs.DC Pub Date : 2024-03-11 Alexandre PhamNPA, Maria Potop-ButucaruNPA, Sébastien TixeuilNPA, IUF, Serge FdidaNPA
Traditional machine learning systems were designed in a centralized manner. In such designs, the central entity maintains both the machine learning model and the data used to adjust the model's parameters. As data centralization yields privacy issues, Federated Learning was introduced to reduce data sharing and have a central server coordinate the learning of multiple devices. While Federated Learning
-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU arXiv.cs.DC Pub Date : 2024-03-11 Changyue Liao, Mo Sun, Zihan Yang, Kaiqi Chen, Binhang Yuan, Fei Wu, Zeke Wang
Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient descent-based
-
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping arXiv.cs.DC Pub Date : 2024-03-11 Seongyeon Park, Junguk Hong, Jaeyong Song, Hajin Kim, Youngsok Kim, Jinho Lee
With the advance in genome sequencing technology, the lengths of deoxyribonucleic acid (DNA) sequencing results are rapidly increasing at lower prices than ever. However, the longer lengths come at the cost of a heavy computational burden on aligning them. For example, aligning sequences to a human reference genome can take tens or even hundreds of hours. The current de facto standard approach for
-
Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors arXiv.cs.DC Pub Date : 2024-03-10 Andres E. Tomas, Enrique S. Quintana-Orti, Hartwig Anzt
We investigate the solution of low-rank matrix approximation problems using the truncated SVD. For this purpose, we develop and optimize GPU implementations for the randomized SVD and a blocked variant of the Lanczos approach. Our work takes advantage of the fact that the two methods are composed of very similar linear algebra building blocks, which can be assembled using numerical kernels from existing
-
A Two-Level Thermal Cycling-aware Task Mapping Technique for Reliability Management in Manycore Systems arXiv.cs.DC Pub Date : 2024-03-10 Fatemeh Hossein Khani, Omid Akbari, Muhammad Shafique
Reliability management is one of the primary concerns in manycore systems design. Different aging mechanisms such as Negative-Bias Temperature Instability (NBTI), Electromigration (EM), and thermal cycling can reduce the reliability of these systems. However, state-of-the-art works mainly focused on NBTI and EM, whereas a few works have considered the thermal cycling effect. The thermal cycling effect
-
Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments arXiv.cs.DC Pub Date : 2024-03-10 Jim Pruyne, Valerie Hayot-Sasson, Weijian Zheng, Ryan Chard, Justin M. Wozniak, Tekin Bicer, Kyle Chard, Ian T. Foster
Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the
-
Blockchain-Enhanced Offloading in Mobile Edge Computing: A Systematic Review and Survey of Current Trends and Future Directions arXiv.cs.DC Pub Date : 2024-03-09 Komeil Moghaddasi, Shakiba Rajabi
With the rapid growth of Internet of Things (IoT) applications, there's a big demand for more processing power and resources in devices. Mobile Edge Computing (MEC) looks promising for enhancing performance and reducing costs by offloading the computing work of IoT to MEC servers. However, the current methods for offloading have issues with privacy and security during the transfer of data and programs
-
DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud arXiv.cs.DC Pub Date : 2024-03-09 Yoochan Kim, Kihyun Kim, Yonghyeon Cho, Jinwoo Kim, Awais Khan, Ki-Dong Kang, Baik-Song An, Myung-Hoon Cha, Hong-Yeon Kim, Youngjae Kim
Distributed Deep Learning (DDL), as a paradigm, dictates the use of GPU-based clusters as the optimal infrastructure for training large-scale Deep Neural Networks (DNNs). However, the high cost of such resources makes them inaccessible to many users. Public cloud services, particularly Spot Virtual Machines (VMs), offer a cost-effective alternative, but their unpredictable availability poses a significant
-
Privacy-Preserving Sharing of Data Analytics Runtime Metrics for Performance Modeling arXiv.cs.DC Pub Date : 2024-03-08 Jonathan Will, Dominik Scheinert, Jan Bode, Cedric Kring, Seraphin Zunzer, Lauritz Thamsen
Performance modeling for large-scale data analytics workloads can improve the efficiency of cluster resource allocations and job scheduling. However, the performance of these workloads is influenced by numerous factors, such as job inputs and the assigned cluster resources. As a result, performance models require significant amounts of training data. This data can be obtained by exchanging runtime
-
SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions arXiv.cs.DC Pub Date : 2024-03-11 Gang Liao, Ye Liu, Yonghua Ding, Le Cai, Jianjun Chen
The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach, distilled into a mere 500 lines of code, effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors
-
Towards Data-center Level Carbon Modeling and Optimization for Deep Learning Inference arXiv.cs.DC Pub Date : 2024-03-08 Shixin Ji, Zhuoping Yang, Xingzhen Chen, Jingtong Hu, Yiyu Shi, Alex K. Jones, Peipei Zhou
Recently, the increasing need for computing resources has led to the prosperity of data centers, which poses challenges to the environmental impacts and calls for improvements in data center provisioning strategies. In this work, we show a comprehensive analysis based on profiling a variety of deep-learning inference applications on different generations of GPU servers. Our analysis reveals several
-
Optimizing CNN Using HPC Tools arXiv.cs.DC Pub Date : 2024-03-07 Shahrin Rahman
This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like OpenMPI and CUDA to speed up CNN model training. The approach improves performance and training time and is superior to alternative strategies. The study demonstrates how HPC technologies can refine the
-
A Survey on Adversarial Contention Resolution arXiv.cs.DC Pub Date : 2024-03-06 Ioana Banicescu, Trisha Chakraborty, Seth Gilbert, Maxwell Young
Contention resolution addresses the challenge of coordinating access by multiple processes to a shared resource such as memory, disk storage, or a communication channel. Originally spurred by challenges in database systems and bus networks, contention resolution has endured as an important abstraction for resource sharing, despite decades of technological change. Here, we survey the literature on resolving
-
Portable, heterogeneous ensemble workflows at scale using libEnsemble arXiv.cs.DC Pub Date : 2024-03-06 Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild
libEnsemble is a Python-based toolkit for running dynamic ensembles, developed as part of the DOE Exascale Computing Project. The toolkit utilizes a unique generator--simulator--allocator paradigm, where generators produce input for simulators, simulators evaluate those inputs, and allocators decide whether and when a simulator or generator should be called. The generator steers the ensemble based
-
Junctiond: Extending FaaS Runtimes with Kernel-Bypass arXiv.cs.DC Pub Date : 2024-03-06 Enrique Saurez, Joshua Fried, Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Sameh Elnikety, Adam Belay, Rodrigo Fonseca
This report explores the use of kernel-bypass networking in FaaS runtimes and demonstrates how using Junction, a novel kernel-bypass system, as the backend for executing components in faasd can enhance performance and isolation. Junction achieves this by reducing network and compute overheads and minimizing interactions with the host operating system. Junctiond, the integration of Junction with faasd
-
Federated Learning Using Coupled Tensor Train Decomposition arXiv.cs.DC Pub Date : 2024-03-05 Xiangtao Zhang, Eleftherios Kofidis, Ce Zhu, Le Zhang, Yipeng Liu
Coupled tensor decomposition (CTD) can extract joint features from multimodal data in various applications. It can be employed for federated learning networks with data confidentiality. Federated CTD achieves data privacy protection by sharing common features and keeping individual features. However, traditional CTD schemes based on canonical polyadic decomposition (CPD) may suffer from low computational
-
Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators arXiv.cs.DC Pub Date : 2024-03-05 Yehonatan Fridman, Guy Tamir, Uri Steinitz, Gal Oren
Monte Carlo (MC) simulations play a pivotal role in diverse scientific and engineering domains, with applications ranging from nuclear physics to materials science. Harnessing the computational power of high-performance computing (HPC) systems, especially Graphics Processing Units (GPUs), has become essential for accelerating MC simulations. This paper focuses on the adaptation and optimization of
-
Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization arXiv.cs.DC Pub Date : 2024-03-04 Morgan Geldenhuys, Dominik Scheinert, Odej Kao, Lauritz Thamsen
Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains a challenge. This is especially
-
Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems arXiv.cs.DC Pub Date : 2024-03-04 Benjamin J. J. Pfister, Dominik Scheinert, Morgan K. Geldenhuys, Odej Kao
Distributed Stream Processing (DSP) systems are capable of processing large streams of unbounded data, offering high throughput and low latencies. To maintain a stable Quality of Service (QoS), these systems require a sufficient allocation of resources. At the same time, over-provisioning can result in wasted energy and high operating costs. Therefore, to maximize resource utilization, autoscaling
-
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving arXiv.cs.DC Pub Date : 2024-03-04 Foteini Strati, Sara Mcallister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic
Distributed LLM serving is costly and often underutilizes hardware accelerators due to three key challenges: bubbles in pipeline-parallel deployments caused by the bimodal latency of prompt and token processing, GPU memory overprovisioning, and long recovery times in case of failures. In this paper, we propose D\'ej\`aVu, a system to address all these challenges using a versatile and efficient KV cache
-
A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning arXiv.cs.DC Pub Date : 2024-03-01 Chenghao Lyu, Qi Fan, Philippe Guyard, Yanlei Diao
As Spark becomes a common big data analytics platform, its growing complexity makes automatic tuning of numerous parameters critical for performance. Our work on Spark parameter tuning is particularly motivated by two recent trends: Spark's Adaptive Query Execution (AQE) based on runtime statistics, and the increasingly popular Spark cloud deployments that make cost-performance reasoning crucial for
-
Are Unikernels Ready for Serverless on the Edge? arXiv.cs.DC Pub Date : 2024-03-01 Felix Moebius, Tobias Pfandzelter, David Bermbach
Function-as-a-Service (FaaS) is a promising edge computing execution model but requires secure sandboxing mechanisms to isolate workloads from multiple tenants on constrained infrastructure. Although Docker containers are lightweight and popular in open-source FaaS platforms, they are generally considered insufficient for executing untrusted code and providing sandbox isolation. Commercial cloud FaaS
-
Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability arXiv.cs.DC Pub Date : 2024-03-01 Qingyuan Liu, Yanning Yang, Dong Du, Yubin Xia, Ping Zhang, Jia Feng, James Larus, Haibo Chen
Current serverless platforms struggle to optimize resource utilization due to their dynamic and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall short, often sacrificing utilization for practicability or incurring performance trade-offs. Overcommitment requires predicting performance to prevent QoS violation, introducing trade-off between prediction accuracy and
-
WindGP: Efficient Graph Partitioning on Heterogenous Machines arXiv.cs.DC Pub Date : 2024-03-01 Li Zeng, Haohan Huang, Binfan Zheng, Kang Yang, Shengcheng Shao, Jinhua Zhou, Jun Xie, Rongqian Zhao, Xin Chen
Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider replication
-
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication arXiv.cs.DC Pub Date : 2024-02-29 Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboos, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler
We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit the
-
Libfork: portable continuation-stealing with stackless coroutines arXiv.cs.DC Pub Date : 2024-02-28 Conor John Williams, James Elliott
Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation stealing in traditional High Performance Computing (HPC) languages -- where it is often impossible without modifying the compiler or resorting to non-portable techniques
-
The Logarithmic Random Bidding for the Parallel Roulette Wheel Selection with Precise Probabilities arXiv.cs.DC Pub Date : 2024-02-28 Koji Nakano
The roulette wheel selection is a critical process in heuristic algorithms, enabling the probabilistic choice of items based on assigned fitness values. It selects an item with a probability proportional to its fitness value. This technique is commonly employed in ant-colony algorithms to randomly determine the next city to visit when solving the traveling salesman problem. Our study focuses on parallel
-
The Design and Implementation of a High-Performance Log-Structured RAID System for ZNS SSDs arXiv.cs.DC Pub Date : 2024-02-28 Jinhong Li, Qiuping Wang, Shujie Han, Patrick P. C. Lee
Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly manage storage in flash-based SSDs as append-only zones. It also provides a Zone Append primitive to further boost the write performance of ZNS SSDs by exploiting intra-zone parallelism. However, making Zone Append effective for reliable and scalable storage, in the form of a RAID array of multiple ZNS SSDs, is non-trivial
-
SmartQC: An Extensible DLT-Based Framework for Trusted Data Workflows in Smart Manufacturing arXiv.cs.DC Pub Date : 2024-02-27 Alan McGibney, Tharindu Ranathunga, Roman Pospisil
Recent developments in Distributed Ledger Technology (DLT), including Blockchain offer new opportunities in the manufacturing domain, by providing mechanisms to automate trust services (digital identity, trusted interactions, and auditable transactions) and when combined with other advanced digital technologies (e.g. machine learning) can provide a secure backbone for trusted data flows between independent
-
TrustRate: A Decentralized Platform for Hijack-Resistant Anonymous Reviews arXiv.cs.DC Pub Date : 2024-02-28 Rohit Dwivedula, Sriram Sridhar, Sambhav Satija, Muthian Sivathanu, Nishanth Chandran, Divya Gupta, Satya Lokam
Reviews and ratings by users form a central component in several widely used products today (e.g., product reviews, ratings of online content, etc.), but today's platforms for managing such reviews are ad-hoc and vulnerable to various forms of tampering and hijack by fake reviews either by bots or motivated paid workers. We define a new metric called 'hijack-resistance' for such review platforms, and
-
Highly-Efficient Persistent FIFO Queues arXiv.cs.DC Pub Date : 2024-02-27 Panagiota Fatourou, Nikos Giachoudis, George Mallis
In this paper, we study the question whether techniques employed, in a conventional system, by state-of-the-art concurrent algorithms to avoid contended hot spots are still efficient for recoverable computing in settings with Non-Volatile Memory (NVM). We focus on concurrent FIFO queues that have two end-points, head and tail, which are highly contended. We present a persistent FIFO queue implementation
-
Navigator: A Decentralized Scheduler for Latency-Sensitive ML Workflows arXiv.cs.DC Pub Date : 2024-02-27 Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg
We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Navigator, a novel framework
-
Deep Reinforcement Learning (DRL)-based Methods for Serverless Stream Processing Engines: A Vision, Architectural Elements, and Future Directions arXiv.cs.DC Pub Date : 2024-02-27 Maria R. Read, Chinmaya Dehury, Satish Narayana Srirama, Rajkumar Buyya
Streaming applications are becoming widespread across an extensive range of business domains as an increasing number of sources continuously produce data that need to be processed and analysed in real time. Modern businesses are aggressively using streaming data to generate valuable knowledge that can be used to automate processes, help decision-making, optimize resource usage, and ultimately generate
-
SoK: Cryptocurrency Wallets -- A Security Review and Classification based on Authentication Factors arXiv.cs.DC Pub Date : 2024-02-27 Ivan Homoliak, Martin Perešíni
In this work, we review existing cryptocurrency wallet solutions with regard to authentication methods and factors from the user's point of view. In particular, we distinguish between authentication factors that are verified against the blockchain and the ones verified locally (or against a centralized party). With this in mind, we define notions for $k-factor$ authentication against the blockchain
-
PureLottery: Fair and Bias-Resistant Leader Election with a Novel Single-Elimination Tournament Algorithm arXiv.cs.DC Pub Date : 2024-02-27 Jonas Ballweg
Leader Election (LE) is crucial in distributed systems and blockchain technology, ensuring one participant acts as the leader. Traditional LE methods often depend on distributed random number generation (RNG), facing issues like vulnerability to manipulation, lack of fairness, and the need for complex procedures such as verifiable delay functions (VDFs) and publicly-verifiable secret sharing (PVSS)
-
A Scalable Multi-Layered Blockchain Architecture for Enhanced EHR Sharing and Drug Supply Chain Management arXiv.cs.DC Pub Date : 2024-02-27 Reza Javan, Mehrzad Mohammadi, Mohammad Beheshti-Atashgah, Mohammad Reza Aref
In recent years, the healthcare sector's shift to online platforms has spotlighted challenges concerning data security, privacy, and scalability. Blockchain technology, known for its decentralized, secure, and immutable nature, emerges as a viable solution for these pressing issues. This article presents an innovative Electronic Health Records (EHR) sharing and drug supply chain management framework
-
Auto Tuning for OpenMP Dynamic Scheduling applied to FWI arXiv.cs.DC Pub Date : 2024-02-26 Felipe H. S. da Silva, João B. Fernandes, Idalmis M. Sardina, Tiago Barros, Samuel Xavier-de-Souza, Italo A. S. Assis
Because Full Waveform Inversion (FWI) works with a massive amount of data, its execution requires much time and computational resources, being restricted to large-scale computer systems such as supercomputers. Techniques such as FWI adapt well to parallel computing and can be parallelized in shared memory systems using the application programming interface (API) OpenMP. The management of parallel tasks
-
CoGenT: A Content-oriented Generative-hit Framework for Content Delivery Networks arXiv.cs.DC Pub Date : 2024-02-26 Peng Wang, Yu Liu, Ziqi Liu, Ming-Yang Wang, Ke Liu, Ke Zhou, Zhihai Huang
The service provided by content delivery networks (CDNs) may overlook content locality, leaving room for potential performance improvement. In this study, we explore the feasibility of leveraging generated data as a replacement for fetching data in missing scenarios based on content locality. Due to sufficient local computing resources and reliable generation efficiency, we propose a content-oriented
-
Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding arXiv.cs.DC Pub Date : 2024-02-24 Siqi Wang, Hailong Yang, Xuezhu Wang, Tongxuan Liu, Pengbo Wang, Xuning Liang, Kejie Ma, Tianyu Feng, Xin You, Yongjun Bao, Yi Liu, Zhongzhi Luan, Depei Qian
Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization to speed up LLM inference, they typically require fine-tuning the LLM, incurring significant
-
FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology arXiv.cs.DC Pub Date : 2024-02-24 Yuanzhe Peng, Jieming Bian, Jie Xu
The fusion of complementary multimodal information is crucial in computational pathology for accurate diagnostics. However, existing multimodal learning approaches necessitate access to users' raw data, posing substantial privacy risks. While Federated Learning (FL) serves as a privacy-preserving alternative, it falls short in addressing the challenges posed by heterogeneous (yet possibly overlapped)
-
PICO: Accelerating All k-Core Paradigms on GPU arXiv.cs.DC Pub Date : 2024-02-23 Chen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao
Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decomposition
-
Trustworthy confidential virtual machines for the masses arXiv.cs.DC Pub Date : 2024-02-23 Anna Galanou, Khushboo Bindlish, Luca Preibsch, Yvonne-Anne Pignolet, Christof Fetzer, Rüdiger Kapitza
Confidential computing alleviates the concerns of distrustful customers by removing the cloud provider from their trusted computing base and resolves their disincentive to migrate their workloads to the cloud. This is facilitated by new hardware extensions, like AMD's SEV Secure Nested Paging (SEV-SNP), which can run a whole virtual machine with confidentiality and integrity protection against a potentially
-
Seer: Proactive Revenue-Aware Scheduling for Live Streaming Services in Crowdsourced Cloud-Edge Platforms arXiv.cs.DC Pub Date : 2024-02-22 Shaoyuan Huang, Zheng Wang, Zhongtian Zhang, Heng Zhang, Xiaofei Wang, Wenyu Wang
As live streaming services skyrocket, Crowdsourced Cloud-edge service Platforms (CCPs) have surfaced as pivotal intermediaries catering to the mounting demand. Despite the role of stream scheduling to CCPs' Quality of Service (QoS) and throughput, conventional optimization strategies struggle to enhancing CCPs' revenue, primarily due to the intricate relationship between resource utilization and revenue
-
Stand-Up Indulgent Gathering on Rings arXiv.cs.DC Pub Date : 2024-02-22 Quentin Bramas, Sayaka Kamei, Anissa Lamani, Sébastien Tixeuil
We consider a collection of $k \geq 2$ robots that evolve in a ring-shaped network without common orientation, and address a variant of the crash-tolerant gathering problem called the \emph{Stand-Up Indulgent Gathering} (SUIG): given a collection of robots, if no robot crashes, robots have to meet at the same arbitrary location, not known beforehand, in finite time; if one robot or more robots crash
-
Towards singular optimality in the presence of local initial knowledge arXiv.cs.DC Pub Date : 2024-02-22 Hongyan Ji, Sriram V. Pemmaraju
The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. It
-
Efficient Wait-Free Linearizable Implementations of Approximate Bounded Counters Using Read-Write Registers arXiv.cs.DC Pub Date : 2024-02-21 Colette Johnen, Adnane Khattabi, Alessia Milani, Jennifer L. Welch
Relaxing the sequential specification of a shared object is a way to obtain an implementation with better performance compared to implementing the original specification. We apply this approach to the Counter object, under the assumption that the number of times the Counter is incremented in any execution is at most a known bound $m$. We consider the $k$-multiplicative-accurate Counter object, where
-
Unveiling Crowdfunding Futures: Analyzing Campaign Outcomes through Distributed Models and Big Data Perspectives arXiv.cs.DC Pub Date : 2024-02-21 Giuseppe Pipitò, Emanuele Macca
Crowdfunding has emerged as a widespread strategy for startups seeking financing, particularly through reward-based methods. However, understanding its economic impact at both micro and macro levels requires thorough analysis, often involving advanced studies on past campaigns to extract insights that aiding companies in optimizing their crowdfunding project types and launch methodologies. Such analyses