当前期刊: arXiv - CS - Performance Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • Stability, memory, and messaging tradeoffs in heterogeneous service systems
    arXiv.cs.PF Pub Date : 2020-07-10
    David Gamarnik; John N. Tsitsiklis; Martin Zubeldia

    We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $\lambda n$, with $0<\lambda<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching

  • Accuracy vs. Complexity for mmWave Ray-Tracing: A Full Stack Perspective
    arXiv.cs.PF Pub Date : 2020-07-14
    Mattia Lecci; Paolo Testolina; Michele Polese; Marco Giordani; Michele Zorzi

    The millimeter wave (mmWave) band will provide multi-gigabits-per-second connectivity in the radio access of future wireless systems. The high propagation loss in this portion of the spectrum calls for the deployment of large antenna arrays to compensate for the loss through high directional gain, thus introducing a spatial dimension in the channel model to accurately represent the performance of a

  • Self-healing Dilemmas in Distributed Systems: Fault-correction vs. Fault-tolerance
    arXiv.cs.PF Pub Date : 2020-07-10
    Jovan Nikolic; Nursultan Jubatyrov; Evangelos Pournaras

    Large-scale decentralized systems of autonomous agents interacting via asynchronous communication often experience the following self-healing dilemma: Fault-detection inherits network uncertainties making a faulty process indistinguishable from a slow process. The implications can be dramatic: Self-healing mechanisms become biased and cost-ineffective. In particular, triggering an undesirable fault-correction

  • Accurate Closed-Form Approximations to Channel Distributions of RIS-Aided Wireless Systems
    arXiv.cs.PF Pub Date : 2020-07-10
    Liang Yang; Fanxu Meng; Qingqing Wu; Daniel Benevides da Costa; Mohamed-Slim Alouini

    This paper proposes highly accurate closed-form approximations to channel distributions of two different reconfigurable intelligent surface (RIS)-based wireless system setups, namely, dual-hop RIS-aided (RIS-DH) scheme and RIS-aided transmit (RIS-T) scheme. Differently from previous works, the proposed approximations reveal to be very tight for arbitrary number $N$ of reflecting metasurface's elements

  • Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU
    arXiv.cs.PF Pub Date : 2020-07-09
    Filippo Mantovani; Marta Garcia-Gasulla; José Gracia; Esteban Stafford; Fabio Banchelli; Marc Josep-Fabrego; Joel Criado-Ledesma; Mathias Nachtmann

    In this paper, we analyze the performance and energy consumption of an Arm-based high-performance computing (HPC) system developed within the European project Mont-Blanc 3. This system, called Dibona, has been integrated by ATOS/Bull, and it is powered by the latest Marvell's CPU, ThunderX2. This CPU is the same one that powers the Astra supercomputer, the first Arm-based supercomputer entering the

  • High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers
    arXiv.cs.PF Pub Date : 2020-07-07
    Maciej Besta; Jens Domke; Marcel Schneider; Marek Konieczny; Salvatore Di Girolamo; Timo Schneider; Ankit Singla; Torsten Hoefler

    The recent line of research into topology design focuses on lowering network diameter. Many low-diameter topologies such as Slim Fly or Jellyfish that substantially reduce cost, power consumption, and latency have been proposed. A key challenge in realizing the benefits of these topologies is routing. On one hand, these networks provide shorter path lengths than established topologies such as Clos

  • On the Efficiency of Decentralized File Storage for Personal Information Management Systems
    arXiv.cs.PF Pub Date : 2020-07-07
    Mirko Zichichi; Stefano Ferretti; Gabriele D'Angelo

    This paper presents an architecture, based on Distributed Ledger Technologies (DLTs) and Decentralized File Storage (DFS) systems, to support the use of Personal Information Management Systems (PIMS). DLT and DFS are used to manage data sensed by mobile users equipped with devices with sensing capability. DLTs guarantee the immutability, traceability and verifiability of references to personal data

  • Benchmarking in Optimization: Best Practice and Open Issues
    arXiv.cs.PF Pub Date : 2020-07-07
    Thomas Bartz-Beielstein; Carola Doerr; Jakob Bossek; Sowmya Chandrasekaran; Tome Eftimov; Andreas Fischbach; Pascal Kerschke; Manuel Lopez-Ibanez; Katherine M. Malan; Jason H. Moore; Boris Naujoks; Patryk Orzechowski; Vanessa Volz; Markus Wagner; Thomas Weise

    This survey compiles ideas and recommendations from more than a dozen researchers with different backgrounds and from different institutes around the world. Promoting best practice in benchmarking is its main goal. The article discusses eight essential topics in benchmarking: clearly stated goals, well-specified problems, suitable algorithms, adequate performance measures, thoughtful analysis, effective

  • Analytics of Longitudinal System Monitoring Data for Performance Prediction
    arXiv.cs.PF Pub Date : 2020-07-07
    Ian J. Costello; Abhinav Bhatele

    In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency. Such data can be used to optimize the performance of individual jobs and the overall system by creating data-driven models that can predict the performance of pending jobs. In this paper, we model the performance

  • Cost-Efficient Storage for On-Demand Video Streaming on Cloud
    arXiv.cs.PF Pub Date : 2020-07-07
    Mahmoud Darwich; Yasser Ismail; Talal Darwich; Magdy Bayoumi

    Video stream is converted to several formats to support the user's device, this conversion process is called video transcoding, which imposes high storage and powerful resources. With emerging of cloud technology, video stream companies adopted to process video on the cloud. Generally, many formats of the same video are made (pre-transcoded) and streamed to the adequate user's device. However, pre-transcoding

  • The prolonged service time at non-dedicated servers in a pooling system
    arXiv.cs.PF Pub Date : 2020-07-03
    Yanting Chen; Jingui Xie; Taozeng Zhu

    In this paper, we investigate the effect of the prolonged service time at the non-dedicated servers in a pooling system on the system performance. We consider the two-server loss model with exponential interarrival and service times. We show that if the ratio of the mean service time at the dedicated server and the mean prolonged service time at the non-dedicated server exceeds a certain threshold

  • A Machine Learning Pipeline Stage for Adaptive Frequency Adjustment
    arXiv.cs.PF Pub Date : 2020-07-02
    Arash Fouman Ajirlou; Inna Partin-Vaisband

    A machine learning (ML) design framework is proposed for adaptively adjusting clock frequency based on propagation delay of individual instructions. A random forest model is trained to classify propagation delays in real time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within

  • A New Theoretical Framework of Pyramid Markov Processes for Blockchain Selfish Mining
    arXiv.cs.PF Pub Date : 2020-07-03
    Quan-Lin Li; Yan-Xia Chang; Xiaole Wu; Guoqing Zhang

    In this paper, we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining. To this end, we first describe a more general blockchain selfish mining with both a two-block leading competitive criterion and a new economic incentive, and establish a pyramid Markov process to express the dynamic behavior of the selfish mining

  • Scalable Comparative Visualization of Ensembles of Call Graphs
    arXiv.cs.PF Pub Date : 2020-07-01
    Suraj P. Kesavan; Harsh Bhatia; Abhinav Bhatele; Todd Gamblin; Peer-Timo Bremer; Kwan-Liu Ma

    Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources. Code developers often explore various execution parameters, such as hardware configurations, system software choices, and application parameters, and are interested in detecting and understanding bottlenecks in different executions. They often collect hierarchical performance profiles

  • COCOA: Cold Start Aware Capacity Planning for Function-as-a-Service Platforms
    arXiv.cs.PF Pub Date : 2020-07-02
    Alim Ul Gias; Giuliano Casale

    Function-as-a-Service (FaaS) is increasingly popular in the software industry due to the implied cost-savings in event-driven workloads and its synergy with DevOps. To size an on-premise FaaS platform, it is important to estimate the required CPU and memory capacity to serve the expected loads. Given the service-level agreements, it is however challenging to take the cold start issue into account during

  • HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems
    arXiv.cs.PF Pub Date : 2020-07-01
    Zihan Jiang; Lei Wang; Xingwang Xiong; Wanling Gao; Chunjie Luo; Fei Tang; Chuanxin Lan; Hongxiao Li; Jianfeng Zhan

    The recent years witness a trend of applying large-scale distributed deep learning in both business and scientific computing areas, whose goal is to speed up the training time to achieve a state-of-the-art quality. The HPC community feels a great interest in building the HPC AI systems that are dedicated to running those workloads. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking

  • Benchmarking for Metaheuristic Black-Box Optimization: Perspectives and Open Challenges
    arXiv.cs.PF Pub Date : 2020-07-01
    Ramses Sala; Ralf Müller

    Research on new optimization algorithms is often funded based on the motivation that such algorithms might improve the capabilities to deal with real-world and industrially relevant optimization challenges. Besides a huge variety of different evolutionary and metaheuristic optimization algorithms, also a large number of test problems and benchmark suites have been developed and used for comparative

  • Probabilistic Bounds on the End-to-End Delay of Service Function Chains using Deep MDN
    arXiv.cs.PF Pub Date : 2020-06-29
    Majid Raeis; Ali Tizghadam; Alberto Leon-Garcia

    Ensuring the conformance of a service system's end-to-end delay to service level agreement (SLA) constraints is a challenging task that requires statistical measures beyond the average delay. In this paper, we study the real-time prediction of the end-to-end delay distribution in systems with composite services such as service function chains. In order to have a general framework, we use queueing theory

  • Queues with Small Advice
    arXiv.cs.PF Pub Date : 2020-06-27
    Michael Mitzenmacher

    Motivated by recent work on scheduling with predicted job sizes, we consider the performance of scheduling algorithms with minimal advice, namely a single bit. Besides demonstrating the power of very limited advice, such schemes are quite natural. In the prediction setting, one bit of advice can be used to model a simple prediction as to whether a job is "large" or "small"; that is, whether a job is

  • GPU-Accelerated Discontinuous Galerkin Methods: 30x Speedup on 345 Billion Unknowns
    arXiv.cs.PF Pub Date : 2020-06-28
    Andrew C. Kirby; Dimitri J. Mavriplis

    A discontinuous Galerkin method for the discretization of the compressible Euler equations, the governing equations of inviscid fluid dynamics, on Cartesian meshes is developed for use of Graphical Processing Units via OCCA, a unified approach to performance portability on multi-threaded hardware architectures. A 30x speedup over CPU-only implementations using non-CUDA-Aware MPI communications is demonstrated

  • Scalable Load Balancing in the Presence of Heterogeneous Servers
    arXiv.cs.PF Pub Date : 2020-06-24
    Kristen Gardner; Jazeem Abdul Jaleel; Alexander Wickeham; Sherwin Doroudi

    Heterogeneity is becoming increasingly ubiquitous in modern large-scale computer systems. Developing good load balancing policies for systems whose resources have varying speeds is crucial in achieving low response times. Indeed, how best to dispatch jobs to servers is a classical and well-studied problem in the queueing literature. Yet the bulk of existing work on large-scale systems assumes homogeneous

  • Design And Develop Network Storage Virtualization By Using GNS3
    arXiv.cs.PF Pub Date : 2020-06-24
    Abdul Ahad Abro; Ufaque Shaikh

    Virtualization is an emerging and optimistic prospect in the IT industry. Its impact has a footprint widely in digital infrastructure. Many innovativeness sectors utilized the concept of virtualization to reduce the cost of frameworks. In this paper, we have designed and developed storage virtualization for physical functional solutions. It is an auspicious type of virtualization that is accessible

  • Guiding Optimizations with Meliora: A Deep Walk down Memory Lane
    arXiv.cs.PF Pub Date : 2020-06-09
    Kewen Meng; Boyana Norris

    Performance models can be very useful for understanding the behavior of applications and hence can help guide design and optimization decisions. Unfortunately, performance modeling of nontrivial computations typically requires significant expertise and human effort. Moreover, even when performed by experts, it is necessarily limited in scope, accuracy, or both. However, since models are not typically

  • Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS
    arXiv.cs.PF Pub Date : 2020-06-16
    Szilárd Páll; Artem Zhmurov; Paul Bauer; Mark Abraham; Magnus Lundborg; Alan Gray; Berk Hess; Erik Lindahl

    The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we present

  • Ansor : Generating High-Performance Tensor Programs for Deep Learning
    arXiv.cs.PF Pub Date : 2020-06-11
    Lianmin Zheng; Chengfan Jia; Minmin Sun; Zhao Wu; Cody Hao Yu; Ameer Haj-Ali; Yida Wang; Jun Yang; Danyang Zhuo; Koushik Sen; Joseph Gonzalez; Ion Stoica

    High-performance tensor programs are crucial to guarantee efficient execution of deep learning models. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously difficult. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require significant

  • A Note on Multiple-Processor Multitask Scheduling
    arXiv.cs.PF Pub Date : 2020-06-11
    Wenxin Li; Ness Shroff

    In this paper we study the multiple-processor multitask scheduling problem in the deterministic and stochastic models. We consider and analyze M-SRPT, a simple modification of the shortest remaining processing time algorithm, which always schedules jobs according to SRPT whenever possible, while processes tasks in an arbitrary order. The modified SRPT algorithm is shown to achieve an competitive ratio

  • Product Forms for FCFS Queueing Models with Arbitrary Server-Job Compatibilities: An Overview
    arXiv.cs.PF Pub Date : 2020-06-10
    Kristen Gardner; Rhonda Righter

    In recent years a number of models involving different compatibilities between jobs and servers in queueing systems, or between agents and resources in matching systems, have been studied, and, under Markov assumptions and appropriate stability conditions, the stationary distributions have been shown to have product forms. We survey these results and show how, under an appropriate detailed description

  • Scalability in Computing and Robotics
    arXiv.cs.PF Pub Date : 2020-06-08
    Heiko Hamann; Andreagiovanni Reina

    Efficient engineered systems require scalability. A scalable system has increasing performance with increasing system size. In an ideal case, the increase in performance (e.g., speedup) corresponds to the number of units that are added to the system. However, if multiple units work on the same task, then coordination among these units is required. This coordination can introduce overheads with an impact

  • Toward a Better Understanding and Evaluation of Tree Structures on Flash SSDs
    arXiv.cs.PF Pub Date : 2020-06-08
    Diego Didona; Nikolas Ioannou; Radu Stoica; Kornilios Kourtis

    Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. In this paper, we show that benchmarking

  • Stochastic Automata Network for Performance Evaluation of Heterogeneous SoC Communication
    arXiv.cs.PF Pub Date : 2020-06-07
    Ulhas Deshmukh; Vineet Sahula

    To meet ever increasing demand for performance of emerging System-on-Chip (SoC) applications, designer employ techniques for concurrent communication between components. Hence communication architecture becomes complex and major performance bottleneck. An early performance evaluation of communication architecture is the key to reduce design time, time-to-market and consequently cost of the system.

  • High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators
    arXiv.cs.PF Pub Date : 2020-06-05
    Shamik Kundu; Ahmet Soyyigit; Khaza Anuarul Hoque; Kanad Basu

    The advent of data-driven real-time applications requires the implementation of Deep Neural Networks (DNNs) on Machine Learning accelerators. Google's Tensor Processing Unit (TPU) is one such neural network accelerator that uses systolic array-based matrix multiplication hardware for computation in its crux. Manufacturing faults at any state element of the matrix multiplication unit can cause unexpected

  • Joint performance analysis of ages of information in a multi-source pushout server
    arXiv.cs.PF Pub Date : 2020-06-05
    Yukang Jiang; Naoto Miyoshi

    Age of information (AoI) has been widely accepted as a measure quantifying freshness of status information in real-time status update systems. In many of such systems, multiple sources share a limited network resource and therefore the AoIs defined for the individual sources should be correlated with each other. However, there are not found any results studying the correlation of two or more AoIs in

  • Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training
    arXiv.cs.PF Pub Date : 2020-06-05
    Hongyu Zhu; Amar Phanishayee; Gennady Pekhimenko

    Modern deep neural network (DNN) training jobs use complex and heterogeneous software/hardware stacks. The efficacy of software-level optimizations can vary significantly when used in different deployment configurations. It is onerous and error-prone for ML practitioners and system developers to implement each optimization separately, and determine which ones will improve performance in their own configurations

  • Unstable Throughput: When the Difficulty Algorithm Breaks
    arXiv.cs.PF Pub Date : 2020-06-04
    Sam M. Werner; Dragos I. Ilie; Iain Stewart; William J. Knottenbelt

    Difficulty algorithms are a fundamental component of Proof-of-Work blockchains, aimed at maintaining stable block production times by dynamically adjusting the network difficulty in response to the miners' constantly changing computational power. Targeting stable block times is critical, as this ensures consistent transaction throughput. Some blockchains need difficulty algorithms that react quickly

  • Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms
    arXiv.cs.PF Pub Date : 2020-06-04
    Weicheng Xue; Christopher J. Roy

    This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs due

  • Efficient Replication for Straggler Mitigation in Distributed Computing
    arXiv.cs.PF Pub Date : 2020-06-03
    Amir Behrouzi-Far; Emina Soljanin

    The potential of distributed computing to improve the performance of big data processing engines is contingent on mitigation of several challenges. In particular, by relying on multiple commodity servers, the performance of a distributed computing engine is dictated by the slowest servers, known as stragglers. Redundancy could mitigate stragglers by reducing the dependence of the computing engine on

  • The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms
    arXiv.cs.PF Pub Date : 2020-06-03
    Davood Ghatreh Samani; Chavit Denninnart; Josef Bacik; Mohsen Amini Salehi

    Cloud providers offer a variety of execution platforms in form of bare-metal, VM, and containers. However, due to the pros and cons of each execution platform, choosing the appropriate platform for a specific cloud-based application has become a challenge for solution architects. The possibility to combine these platforms (e.g. deploying containers within VMs) offers new capacities that makes the challenge

  • Detecting and Understanding Real-World Differential Performance Bugs in Machine Learning Libraries
    arXiv.cs.PF Pub Date : 2020-06-03
    Saeid Tizpaz-Niari; Pavol Cerný; Ashutosh Trivedi

    Programming errors that degrade the performance of systems are widespread, yet there is little tool support for analyzing these bugs. We present a method based on differential performance analysis---we find inputs for which the performance varies widely, despite having the same size. To ensure that the differences in the performance are robust (i.e. hold also for large inputs), we compare the performance

  • MLOS: An Infrastructure for AutomatedSoftware Performance Engineering
    arXiv.cs.PF Pub Date : 2020-06-01
    Carlo Curino; Neha Godwal; Brian Kroth; Sergiy Kuryata; Greg Lapinski; Siqi Liu; Slava Oks; Olga Poppe; Adam Smiechowski; Ed Thayer; Markus Weimer; Yiwen Zhu

    Developing modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context. Today's SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of standardized

  • Staffing for many-server systems facing non-standard arrival processes
    arXiv.cs.PF Pub Date : 2020-05-31
    M. Heemskerk; M. Mandjes; B. Mathijsen

    Arrival processes to service systems often display (i) larger than anticipated fluctuations, (ii) a time-varying rate, and (iii) temporal correlation. Motivated by this, we introduce a specific non-homogeneous Poisson process that incorporates these three features. The resulting arrival process is fed into an infinite-server system, which is then used as a proxy for its many-server counterpart. This

  • Cloud-scale VM Deflation for Running Interactive Applications On Transient Servers
    arXiv.cs.PF Pub Date : 2020-05-31
    Alexander Fuerst; Ahmed Ali-Eldin; Prashant Shenoy; Prateek Sharma

    Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server preemption

  • Age of Information in a network of queues
    arXiv.cs.PF Pub Date : 2020-05-28
    Ioannis Koukoutsidis

    We show how to calculate the Age of Information in an overtake-free network of quasi-reversible queues, with exponential exogenous interarrivals of multiple classes of update packets and exponential service times at all nodes. Results are provided for any number of M/M/1 First-Come-First-Served (FCFS) queues in tandem, and for a network with two classes of update packets, entering through different

  • ProTuner: Tuning Programs with Monte Carlo Tree Search
    arXiv.cs.PF Pub Date : 2020-05-27
    Ameer Haj-Ali; Hasan Genc; Qijing Huang; William Moses; John Wawrzynek; Krste Asanović; Ion Stoica

    We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing. We build our framework on top of Halide and show that MCTS can outperform the state-of-the-art beam-search algorithm. Unlike beam search, which is guided by greedy intermediate performance comparisons between partial and less meaningful

  • Threshold-based rerouting and replication for resolving job-server affinity relations
    arXiv.cs.PF Pub Date : 2020-05-27
    Youri Raaijmakers; Sem Borst; Onno Boxma

    We consider a system with several job types and two parallel server pools. Within the pools the servers are homogeneous, but across pools possibly not in the sense that the service speed of a job may depend on its type as well as the server pool. Immediately upon arrival, jobs are assigned to a server pool. This could be based on (partial) knowledge of their type, but such knowledge might not be available

  • A review of analytical performance modeling and its role in computer engineering and science
    arXiv.cs.PF Pub Date : 2020-05-27
    Y. C. Tay

    This article is a review of analytical performance modeling for computer systems. It discusses the motivation for this area of research, examines key issues, introduces some ideas, illustrates how it is applied, and points out a role that it can play in developing Computer Science.

  • IoT-based Emergency Evacuation Systems
    arXiv.cs.PF Pub Date : 2020-05-27
    Mahyar Tourchi Moghaddam

    Fires, earthquakes, floods, hurricanes, overcrowding, or and even pandemic viruses endanger human lives. Hence, designing infrastructures to handle possible emergencies has become an ever-increasing need. The safe evacuation of occupants from the building takes precedence when dealing with the necessary mitigation and disaster risk management. This thesis deals with designing an IoT system to provide

  • Benchmarking Graph Data Management and Processing Systems: A Survey
    arXiv.cs.PF Pub Date : 2020-05-26
    Miyuru Dayarathna; Toyotaro Suzumura

    The development of scalable, representative, and widely adopted benchmarks for graph data systems have been a question for which answers has been sought for decades. We conduct an in-depth study of the existing literature on benchmarks for graph data management and processing, covering 20 different benchmarks developed during the last 15 years. We categorize the benchmarks into three areas focusing

  • TeaMPI -- Replication-based Resilience without the (Performance) Pain
    arXiv.cs.PF Pub Date : 2020-05-25
    Philipp Samfass; Tobias Weinzierl; Benjamin Hazelwood; Michael Bader

    In an era where we can not afford to checkpoint frequently, replication is a generic way forward to construct numerical simulations that can continue to run even if hardware parts fail. Yet, replication often is not employed on larger scales, as na\"ively mirroring a computation once effectively halves the machine size, and as keeping replicated simulations consistent with each other is not trivial

  • Tsunami propagation for singular topographies
    arXiv.cs.PF Pub Date : 2020-05-25
    Arshyn Altybay; Michael Ruzhansky; Mohammed Elamine Sebih; Niyaz Tokmagambetov

    We consider a tsunami wave equation with singular coefficients and prove that it has a very weak solution. Moreover, we show the uniqueness results and consistency theorem of the very weak solution with the classical one in some appropriate sense. Numerical experiments are done for the families of regularised problems in one- and two-dimensional cases. In particular, the appearance of a substantial

  • Benchmarking and Performance Modelling of MapReduce Communication Pattern
    arXiv.cs.PF Pub Date : 2020-05-23
    Sheriffo Ceesay; Adam Barker; Yuhui Lin

    Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive

  • A Comprehensive Study on Software Aging across Android Versions and Vendors
    arXiv.cs.PF Pub Date : 2020-05-23
    Domenico Cotroneo; Antonio Ken Iannillo; Roberto Natella; Roberto Pietrantuono

    This paper analyzes the phenomenon of software aging - namely, the gradual performance degradation and resource exhaustion in the long run - in the Android OS. The study intends to highlight if, and to what extent, devices from different vendors, under various usage conditions and configurations, are affected by software aging and which parts of the system are the main contributors. The results demonstrate

  • Profiling Resource Utilization of Bioinformatics Workflows
    arXiv.cs.PF Pub Date : 2020-05-23
    Huazeng Deng; Ling-Hong Hung; Raymond Schooley; David Perez; Niharika Arumilli; Ka Yee Yeung; Wes Lloyd

    We present a software tool, the Container Profiler, that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of a containerized job by collecting Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler can produce utilization snapshots at multiple time points, allowing

  • Autonomous Task Dropping Mechanism to Achieve Robustness in Heterogeneous Computing Systems
    arXiv.cs.PF Pub Date : 2020-05-22
    Ali Mokhtari; Chavit Denninnart; Mohsen Amini Salehi

    Robustness of a distributed computing system is defined as the ability to maintain its performance in the presence of uncertain parameters. Uncertainty is a key problem in heterogeneous (and even homogeneous) distributed computing systems that perturbs system robustness. Notably, the performance of these systems is perturbed by uncertainty in both task execution time and arrival. Accordingly, our goal

  • Mapping Matters: Application Process Mapping on 3-D Processor Topologies
    arXiv.cs.PF Pub Date : 2020-05-21
    Jonas H. Müller Korndörfer; Mario Bielert; Laércio L. Pilla; Florina M. Ciorba

    Applications' performance is influenced by the mapping of processes to computing nodes, the frequency and volume of exchanges among processing elements, the network capacity, and the routing protocol. A poor mapping of application processes degrades performance and wastes resources. Process mapping is frequently ignored as an explicit optimization step since the system typically offers a default mapping

  • Optimal Resource Allocation for Elastic and Inelastic Jobs
    arXiv.cs.PF Pub Date : 2020-05-19
    Benjamin Berg; Mor Harchol-Balter; Benjamin Moseley; Weina Wang; Justin Whitehouse

    Modern data centers are tasked with processing heterogeneous workloads consisting of various classes of jobs. These classes differ in their arrival rates, size distributions, and job parallelizability. With respect to paralellizability, some jobs are elastic, meaning they can parallelize linearly across many servers. Other jobs are inelastic, meaning they can only run on a single server. Although job

  • SoS-RPL: Securing Internet of Things Against Sinkhole Attack Using RPL Protocol-Based Node Rating and Ranking Mechanism
    arXiv.cs.PF Pub Date : 2020-05-17
    Mina Zaminkar; Reza Fotohi

    Through the Internet of Things (IoT) the internet scope is established by the integration of physical things to classify themselves into mutual things. A physical thing can be created by this inventive perception to signify itself in the digital world. Regarding the physical things that are related to the internet, it is worth noting that considering numerous theories and upcoming predictions, they

  • Latency Analysis of Multiple Classes of AVB Traffic in TSN with Standard Credit Behavior using Network Calculus
    arXiv.cs.PF Pub Date : 2020-05-17
    Luxi Zhao; Paul Pop; Zhong Zheng; Hugo Daigmorte; Marc Boyer

    Time-Sensitive Networking (TSN) is a set of amendments that extend Ethernet to support distributed safety-critical and real-time applications in the industrial automation, aerospace and automotive areas. TSN integrates multiple traffic types and supports interactions in several combinations. In this paper we consider the configuration supporting Scheduled Traffic (ST) traffic scheduled based on Gate-Control-Lists

  • Performance Analysis for Multi-Antenna Small Cell Networks with Clustered Dynamic TDD
    arXiv.cs.PF Pub Date : 2020-05-15
    Hongguang Sun; Howard H. Yang; Xijun Wang; Chao Xu; Tony Q. S. Quek

    Small cell networks with dynamic time-division duplex (D-TDD) have emerged as a potential solution to address the asymmetric traffic demands in 5G wireless networks. By allowing the dynamic adjustment of cell-specific UL/DL configuration, D-TDD flexibly allocates percentage of subframes to UL and DL transmissions to accommodate the traffic within each cell. However, the unaligned transmissions bring

  • High Performance and Portable Convolution Operators for ARM-based Multicore Processors
    arXiv.cs.PF Pub Date : 2020-05-13
    Pablo San Juan; Adrián Castelló; Manuel F. Dolz; Pedro Alonso-Jordá; Enrique S. Quintana-Ortí

    The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the \imcol transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the

  • Competitive Algorithms for Minimizing the Maximum Age-of-Information
    arXiv.cs.PF Pub Date : 2020-05-12
    Rajarshi Bhattacharjee; Abhishek Sinha

    In this short paper, we consider the problem of designing a near-optimal competitive scheduling policy for $N$ mobile users, to maximize the freshness of available information uniformly across all users. Prompted by the unreliability and non-stationarity of the emerging 5G-mmWave channels for high-speed users, we forego of any statistical assumptions of the wireless channels and user-mobility. Instead

Contents have been reproduced by permission of the publishers.
Springer Nature Live 产业与创新线上学术论坛
ACS ES&T Engineering
ACS ES&T Water