当前期刊: arXiv - CS - Performance Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Leveraging Architectural Support of Three Page Sizes with Trident
    arXiv.cs.PF Pub Date : 2020-11-24
    Venkat Sri Sai Ram; Ashish Panwar; Arkaprava Basu

    Large pages are commonly deployed to reduce address translation overheads for big-memory workloads. Modern x86-64 processors from Intel and AMD support two large page sizes -- 1GB and 2MB. However, previous works on large pages have primarily focused on 2MB pages, partly due to lack of substantial evidence on the profitability of 1GB pages to real-world applications. We argue that in fact, inadequate

    更新日期:2020-11-25
  • Benchmarking Inference Performance of Deep Learning Models on Analog Devices
    arXiv.cs.PF Pub Date : 2020-11-24
    Omobayode Fagbohungbe; Lijun Qian

    Analog hardware implemented deep learning models are promising for computation and energy constrained systems such as edge computing devices. However, the analog nature of the device and the associated many noise sources will cause changes to the value of the weights in the trained deep learning models deployed on such devices. In this study, systematic evaluation of the inference performance of trained

    更新日期:2020-11-25
  • Patch-based field-of-view matching in multi-modal images for electroporation-based ablations
    arXiv.cs.PF Pub Date : 2020-11-09
    Luc Lafitte; Rémi Giraud; Cornel Zachiu; Mario Ries; Olivier Sutter; Antoine Petit; Olivier Seror; Clair Poignard; Baudouin Denis de Senneville

    Various multi-modal imaging sensors are currently involved at different steps of an interventional therapeutic work-flow. Cone beam computed tomography (CBCT), computed tomography (CT) or Magnetic Resonance (MR) images thereby provides complementary functional and/or structural information of the targeted region and organs at risk. Merging this information relies on a correct spatial alignment of the

    更新日期:2020-11-25
  • HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC
    arXiv.cs.PF Pub Date : 2020-11-22
    Michael Riera; Erfan Bank Tavakoli; Masudul Hassan Quraishi; Fengbo Ren

    Hardware-agnostic programming with high performance portability will be the bedrock for realizing the ubiquitous adoption of emerging accelerator technologies in future heterogeneous high-performance computing (HPC) systems, which is the key to achieving the next level of HPC performance on an expanding accelerator landscape. In this paper, we present HALO 1.0, an open-ended extensible multi-agent

    更新日期:2020-11-25
  • Optimal Transaction Queue Waiting in Blockchain Mining
    arXiv.cs.PF Pub Date : 2020-11-21
    Gholamreza Ramezan; Cyril Leung; Chunyan Miao

    Blockchain systems are being used in a wide range of application domains. They can support trusted transactions in time critical applications. In this paper, we study how miners should pick up transactions from a transaction pool so as to minimize the average waiting time per transaction. We derive an expression for the average transaction waiting time of the proposed mining scheme and determine the

    更新日期:2020-11-25
  • Enhanced Innovized Repair Operator for Evolutionary Multi- and Many-objective Optimization
    arXiv.cs.PF Pub Date : 2020-11-21
    Sukrit Mittal; Dhish Kumar Saxena; Kalyanmoy Deb; Erik Goodman

    "Innovization" is a task of learning common relationships among some or all of the Pareto-optimal (PO) solutions in multi- and many-objective optimization problems. Recent studies have shown that a chronological sequence of non-dominated solutions obtained in consecutive iterations during an optimization run also possess salient patterns that can be used to learn problem features to help create new

    更新日期:2020-11-25
  • Zero Queueing for Multi-Server Jobs
    arXiv.cs.PF Pub Date : 2020-11-20
    Weina Wang; Qiaomin Xie; Mor Harchol-Balter

    Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-job-per-server model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From

    更新日期:2020-11-23
  • A Bounded Multi-Vacation Queue Model for Multi-stage Sleep Control 5G Base station
    arXiv.cs.PF Pub Date : 2020-11-19
    Jie Chen

    Modelling and control of energy consumption is an important problem in telecommunication systems.To model such systems, this paper publishes a bounded multi-vacation queue model. The energy consumption predicted by the model shows an average error rate of 0.0177 and the delay predicted by the model shows an average error rate of 0.0655 over 99 test instances.Subsequently, an optimization algorithm

    更新日期:2020-11-21
  • FedEval: A Benchmark System with a Comprehensive Evaluation Model for Federated Learning
    arXiv.cs.PF Pub Date : 2020-11-19
    Di Chai; Leye Wang; Kai Chen; Qiang Yang

    As an innovative solution for privacy-preserving machine learning (ML), federated learning (FL) is attracting much attention from research and industry areas. While new technologies proposed in the past few years do evolve the FL area, unfortunately, the evaluation results presented in these works fall short in integrity and are hardly comparable because of the inconsistent evaluation metrics and the

    更新日期:2020-11-21
  • heSRPT: Parallel Scheduling to Minimize Mean Slowdown
    arXiv.cs.PF Pub Date : 2020-11-18
    Benjamin Berg; Rein Vesilo; Mor Harchol-Balter

    Modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. Because allocating multiple servers to a single job is inefficient, it is unclear how best to allocate a fixed number of servers between many parallelizable jobs.

    更新日期:2020-11-21
  • Performance Analysis of UAV-based Mixed RF-UWOC Transmission Systems
    arXiv.cs.PF Pub Date : 2020-11-18
    Sai Li; Liang Yang; Daniel Benevides da Costa

    In this paper, we investigate the performance of a mixed radio-frequency-underwater wireless optical communication (RF-UWOC) system where an unmanned aerial vehicle (UAV), as a low-altitude mobile aerial base station, transmits information to an autonomous underwater vehicle (AUV) through a fixed-gain amplify-and-forward (AF) or decode-and-forward (DF) relay. Our analysis accounts for the main factors

    更新日期:2020-11-19
  • On the Performance of RIS-Assisted Dual-Hop Mixed RF-UWOC Systems
    arXiv.cs.PF Pub Date : 2020-11-18
    Sai Li; Liang Yang; Daniel Benevides da Costa; Marco Di Renzo; Mohamed-Slim Alouini

    In this paper, we investigate the performance of a reconfigurable intelligent surface (RIS)-assisted dual-hop mixed radio-frequency underwater wireless optical communication (RF-UWOC) system. An RIS is an emerging and low-cost technology that aims to enhance the strength of the received signal, thus improving the system performance. In the considered system setup, a ground source does not have a reliable

    更新日期:2020-11-19
  • Performance Analysis of Dual-Hop Mixed PLC/RF Communication Systems
    arXiv.cs.PF Pub Date : 2020-11-18
    Liang Yang; Xiaoqin Yan; Sai Li; Daniel Benevides da Costa; Mohamed-Slim Alouini

    In this paper, we study a dual-hop mixed power line communication and radio-frequency communication (PLC/RF) system, where the connection between the PLC link and the RF link is made by a decode-and-forward (DF) or amplify-and-forward (AF) relay. Assume that the PLC channel is affected by both additive background noise and impulsive noise suffers from Log-normal fading, while the RF link undergoes

    更新日期:2020-11-19
  • Ginkgo -- A Math Library designed for Platform Portability
    arXiv.cs.PF Pub Date : 2020-11-17
    Terry Cojean; Yu-Hsiang "Mike" Tsai; Hartwig Anzt

    The first associations to software sustainability might be the existence of a continuous integration (CI) framework; the existence of a testing framework composed of unit tests, integration tests, and end-to-end tests; and also the existence of software documentation. However, when asking what is a common deathblow for a scientific software product, it is often the lack of platform and performance

    更新日期:2020-11-19
  • Automatic Microprocessor Performance Bug Detection
    arXiv.cs.PF Pub Date : 2020-11-17
    Erick Carvajal Barboza; Sara Jacob; Mahesh Ketkar; Michael Kishinevsky; Paul Gratz; Jiang Hu

    Processor design validation and debug is a difficult and complex task, which consumes the lion's share of the design process. Design bugs that affect processor performance rather than its functionality are especially difficult to catch, particularly in new microarchitectures. This is because, unlike functional bugs, the correct processor performance of new microarchitectures on complex, long-running

    更新日期:2020-11-18
  • Optimizing Graph Processing and Preprocessing with Hardware Assisted Propagation Blocking
    arXiv.cs.PF Pub Date : 2020-11-17
    Vignesh Balaji; Brandon Lucia

    Extensive prior research has focused on alleviating the characteristic poor cache locality of graph analytics workloads. However, graph pre-processing tasks remain relatively unexplored. In many important scenarios, graph pre-processing tasks can be as expensive as the downstream graph analytics kernel. We observe that Propagation Blocking (PB), a software optimization designed for SpMV kernels, generalizes

    更新日期:2020-11-18
  • Tools for modelling and simulating the Smart Grid
    arXiv.cs.PF Pub Date : 2020-11-16
    Ricardo M. Czekster

    The Smart Grid (SG) is a Cyber-Physical System (CPS) considered a critical infrastructure divided into cyber (software) and physical (hardware) counterparts that complement each other. It is responsible for timely power provision wrapped by Information and Communication Technologies (ICT) for handling bi-directional energy flows in electric power grids. Enacting control and performance over the massive

    更新日期:2020-11-17
  • Performance Analysis of an Interference-Limited RIS-Aided Network
    arXiv.cs.PF Pub Date : 2020-11-15
    Liang Yang; Yin Yang; Daniel Benevides da Costa; Imene Trigui

    In this work, the performance of reconfigurable intelligent surface (RIS)-aided communication systems corrupted by the co-channel interference (CCI) at the destination is investigated. Assuming Rayleigh fading and equal-power CCI, we present the analysis for the outage probability (OP), average bit error rate (BER), and ergodic capacity. In addition, an asymptotic outage analysis is carried in order

    更新日期:2020-11-17
  • RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems
    arXiv.cs.PF Pub Date : 2020-11-14
    Bai Liu; Qiaomin Xie; Eytan Modiano

    With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning

    更新日期:2020-11-17
  • Phoebe: Reuse-Aware Online Caching with Reinforcement Learning for Emerging Storage Models
    arXiv.cs.PF Pub Date : 2020-11-13
    Nan Wu; Pengcheng Li

    With data durability, high access speed, low power efficiency and byte addressability, NVMe and SSD, which are acknowledged representatives of emerging storage technologies, have been applied broadly in many areas. However, one key issue with high-performance adoption of these technologies is how to properly define intelligent cache layers such that the performance gap between emerging technologies

    更新日期:2020-11-17
  • A Model of Polarization on Social Media Caused by Empathy and Repulsion
    arXiv.cs.PF Pub Date : 2020-11-16
    Naoki Hirakura; Masaki Aida; Konosuke Kawashima

    In recent years, the ease with which social media can be accessed has led to the unexpected problem of a shrinkage in information sources. This phenomenon is caused by a system that facilitates the connection of people with similar ideas and recommendation systems. Bias in the selection of information sources promotes polarization that divides people into multiple groups with opposing views and creates

    更新日期:2020-11-17
  • Performance and Power Modeling and Prediction Using MuMMI and Ten Machine Learning Methods
    arXiv.cs.PF Pub Date : 2020-11-12
    Xingfu Wu; Valerie Taylor; Zhiling Lan

    In this paper, we use modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and ten machine learning methods to model and predict performance and power and compare their prediction error rates. We use a fault-tolerant linear algebra code and a fault-tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argonne

    更新日期:2020-11-16
  • Utilizing Ensemble Learning for Performance and Power Modeling and Improvement of Parallel Cancer Deep Learning CANDLE Benchmarks
    arXiv.cs.PF Pub Date : 2020-11-12
    Xingfu Wu; Valerie Taylor

    Machine learning (ML) continues to grow in importance across nearly all domains and is a natural tool in modeling to learn from data. Often a tradeoff exists between a model's ability to minimize bias and variance. In this paper, we utilize ensemble learning to combine linear, nonlinear, and tree-/rule-based ML methods to cope with the bias-variance tradeoff and result in more accurate models. Hardware

    更新日期:2020-11-16
  • DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator
    arXiv.cs.PF Pub Date : 2020-11-11
    Zihan Liu; Jingwen Leng; Quan Chen; Chao Li; Wenli Zheng; Li Li; Minyi Guo

    Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important yet less explored problem. In this paper, we perform the compiler-stage optimization study using a novel and representative Cambricon DNN accelerator and demonstrate

    更新日期:2020-11-12
  • Optimizing the Age-of-Information for Mobile Users in Adversarial and Stochastic Environments
    arXiv.cs.PF Pub Date : 2020-11-10
    Abhishek Sinha; Rajarshi Bhattacharjee

    We study a multi-user downlink scheduling problem for optimizing the freshness of information available to users roaming across multiple cells. We consider both adversarial and stochastic settings and design scheduling policies that optimize two distinct information freshness metrics, namely the average age-of-information and the peak age-of-information. We show that a natural greedy scheduling policy

    更新日期:2020-11-12
  • Resource Allocation in One-dimensional Distributed Service Networks with Applications
    arXiv.cs.PF Pub Date : 2020-11-09
    Nitish K. Panigrahy; Prithwish Basu; Philippe Nain; Don Towsley; Ananthram Swami; Kevin S. Chan; Kin K. Leung

    We consider assignment policies that allocate resources to users, where both resources and users are located on a one-dimensional line. First, we consider unidirectional assignment policies that allocate resources only to users located to their left. We propose the Move to Right (MTR) policy, which scans from left to right assigning nearest rightmost available resource to a user, and contrast it to

    更新日期:2020-11-12
  • Assessing the Feasibility of Web-Request Prediction Models on Mobile Platforms
    arXiv.cs.PF Pub Date : 2020-11-10
    Yixue Zhao; Siwei Yin; Adriana Sejfia; Marcelo Schmitt Laser; Haoyu Wang; Nenad Medvidovic

    Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations make it infeasible to explore prefetching with the usual strategy of amassing large amounts of data over long periods and constructing conventional, "large" prediction

    更新日期:2020-11-12
  • Mapping Stencils on Coarse-grained Reconfigurable Spatial Architecture
    arXiv.cs.PF Pub Date : 2020-11-06
    Jesmin Jahan Tithi; Fabrizio Petrini; Hongbo Rong; Andrei Valentin; Carl Ebeling

    Stencils represent a class of computational patterns where an output grid point depends on a fixed shape of neighboring points in an input grid. Stencil computations are prevalent in scientific applications engaging a significant portion of supercomputing resources. Therefore, it has been always important to optimize stencil programs for the best performance. A rich body of research has focused on

    更新日期:2020-11-12
  • An approach to define Very High Capacity Networks with improved quality at an affordable cost
    arXiv.cs.PF Pub Date : 2020-11-07
    Giovanni Santella; Francesco Vatalaro

    This paper aims to propose one possible approach in the setting of VHCNs (Very High Capacity Networks) performance targets that should be capable of promoting efficient investments for operators and, at the same time, improving the benefits for end-users. To this aim, we suggest relying on some specific KPIs (Key Performance Indicators), especially throughput - i.e., the bandwidth as perceived by the

    更新日期:2020-11-12
  • Towards Latency-aware DNN Optimization with GPU Runtime Analysis and Tail Effect Elimination
    arXiv.cs.PF Pub Date : 2020-11-08
    Fuxun Yu; Zirui Xu; Tong Shen; Dimitrios Stamoulis; Longfei Shangguan; Di Wang; Rishi Madhok; Chunshui Zhao; Xin Li; Nikolaos Karianakis; Dimitrios Lymberopoulos; Ang Li; ChenChen Liu; Yiran Chen; Xiang Chen

    Despite the superb performance of State-Of-The-Art (SOTA) DNNs, the increasing computational cost makes them very challenging to meet real-time latency and accuracy requirements. Although DNN runtime latency is dictated by model property (e.g., architecture, operations), hardware property (e.g., utilization, throughput), and more importantly, the effective mapping between these two, many existing approaches

    更新日期:2020-11-12
  • Runtime Performances Benchmark for Knowledge Graph Embedding Methods
    arXiv.cs.PF Pub Date : 2020-11-05
    Angelica Sofia Valeriani

    This paper wants to focus on providing a characterization of the runtime performances of state-of-the-art implementations of KGE alghoritms, in terms of memory footprint and execution time. Despite the rapidly growing interest in KGE methods, so far little attention has been devoted to their comparison and evaluation; in particular, previous work mainly focused on performance in terms of accuracy in

    更新日期:2020-11-12
  • Power-Aware Run-Time Scheduler for Mixed-Criticality Systems on Multi-Core Platform
    arXiv.cs.PF Pub Date : 2020-11-06
    Behnaz Ranjbar; Tuan D. A. Nguyen; Alireza Ejlali; Akash Kumar

    In modern multi-core Mixed-Criticality (MC) systems, a rise in peak power consumption due to parallel execution of tasks with maximum frequency, specially in the overload situation, may lead to thermal issues, which may affect the reliability and timeliness of MC systems. Therefore, managing peak power consumption has become imperative in multi-core MC systems. In this regard, we propose an online

    更新日期:2020-11-09
  • On the Analysis of Spatially Constrained Power of Two Choice Policies
    arXiv.cs.PF Pub Date : 2020-11-05
    Nitish K. Panigrahy; Prithwish Basu; Don Towsley; Ananthram Swami; Kin K. Leung

    We consider a class of power of two choice based assignment policies for allocating users to servers, where both users and servers are located on a two-dimensional Euclidean plane. In this framework, we investigate the inherent tradeoff between the communication cost, and load balancing performance of different allocation policies. To this end, we first design and evaluate a Spatial Power of two (sPOT)

    更新日期:2020-11-06
  • Simulation-Based Performance Prediction of HPC Applications: A Case Study of HPL
    arXiv.cs.PF Pub Date : 2020-11-05
    Gen Xu; Huda Ibeid; Xin Jiang; Vjekoslav Svilan; Zhaojuan Bian

    We propose a simulation-based approach for performance modeling of parallel applications on high-performance computing platforms. Our approach enables full-system performance modeling: (1) the hardware platform is represented by an abstract yet high-fidelity model; (2) the computation and communication components are simulated at a functional level, where the simulator allows the use of the components

    更新日期:2020-11-06
  • No more 996: Understanding Deep Learning Inference Serving with an Automatic Benchmarking system
    arXiv.cs.PF Pub Date : 2020-11-04
    Huaizheng Zhang; Yizheng Huang; Yonggang Wen; Jianxiong Yin; Kyle Guan

    Deep learning (DL) models have become core modules for many applications. However, deploying these models without careful performance benchmarking that considers both hardware and software's impact often leads to poor service and costly operational expenditure. To facilitate DL models' deployment, we implement an automatic and comprehensive benchmark system for DL developers. To accomplish benchmark-related

    更新日期:2020-11-06
  • Proximity Based Load Balancing Policies on Graphs: A Simulation Study
    arXiv.cs.PF Pub Date : 2020-11-03
    Nitish K. Panigrahy; Thirupathaiah Vasantam; Prithwish Basu; Don Towsley

    Distributed load balancing is the act of allocating jobs among a set of servers as evenly as possible. There are mainly two versions of the load balancing problem that have been studied in the literature: static and dynamic. The static interpretation leads to formulating the load balancing problem as a case with jobs (balls) never leaving the system and accumulating at the servers (bins) whereas the

    更新日期:2020-11-04
  • Solving large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs: performance comparisons of MPGOS, ODEINT and DifferentialEquations.jl
    arXiv.cs.PF Pub Date : 2020-11-02
    Dániel Nagy; Lambert Plavecz; Ferenc Hegedűs

    In this paper, the performance characteristics of different solution techniques and program packages to solve a large number of independent ordinary differential equation systems is examined. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance

    更新日期:2020-11-04
  • An End-to-End ML System for Personalized Conversational Voice Models in Walmart E-Commerce
    arXiv.cs.PF Pub Date : 2020-11-02
    Rahul Radhakrishnan Iyer; Praveenkumar Kanumala; Stephen Guo; Kannan Achan

    Searching for and making decisions about products is becoming increasingly easier in the e-commerce space, thanks to the evolution of recommender systems. Personalization and recommender systems have gone hand-in-hand to help customers fulfill their shopping needs and improve their experiences in the process. With the growing adoption of conversational platforms for shopping, it has become important

    更新日期:2020-11-03
  • 10 Years Later: Cloud Computing is Closing the Performance Gap
    arXiv.cs.PF Pub Date : 2020-11-02
    Giulia Guidi; Marquita Ellis; Aydin Buluc; Katherine Yelick; David Culler

    Large scale modeling and simulation problems, from nanoscale materials to universe-scale cosmology, have in the past used the massive computing resources of High-Performance Computing (HPC) systems. Over the last decade, cloud computing has gained popularity for business applications and increasingly for computationally intensive machine learning problems. Despite the prolific literature, the question

    更新日期:2020-11-03
  • An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs
    arXiv.cs.PF Pub Date : 2020-10-31
    Ayesha Afzal; Georg Hager; Gerhard Wellein

    Complex applications running on multicore processors show a rich performance phenomenology. The growing number of cores per ccNUMA domain complicates performance analysis of memory-bound code since system noise, load imbalance, or task-based programming models can lead to thread desynchronization. Hence, the simplifying assumption that all cores execute the same loop can not be upheld. Motivated by

    更新日期:2020-11-03
  • Effects of round-to-nearest and stochastic rounding in the numerical solution of the heat equation in low precision
    arXiv.cs.PF Pub Date : 2020-10-30
    Matteo Croci; Michael Bryce Giles

    Motivated by the advent of machine learning, the last few years saw the return of hardware-supported low-precision computing. Computations with fewer digits are faster and more memory and energy efficient, but can be extremely susceptible to rounding errors. An application that can largely benefit from the advantages of low-precision computing is the numerical solution of partial differential equations

    更新日期:2020-11-02
  • Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices
    arXiv.cs.PF Pub Date : 2020-10-30
    Guangli Li; Xiu Ma; Xueying Wang; Lei Liu; Jingling Xue; Xiaobing Feng

    The increasing computational cost of deep neural network models limits the applicability of intelligent applications on resource-constrained edge devices. While a number of neural network pruning methods have been proposed to compress the models, prevailing approaches focus only on parametric operators (e.g., convolution), which may miss optimization opportunities. In this paper, we present a novel

    更新日期:2020-11-02
  • Poster: Benchmarking Financial Data Feed Systems
    arXiv.cs.PF Pub Date : 2020-10-29
    Manuel Coenen; Christoph Wagner; Alexander Echler; Sebastian Frischbier

    Data-driven solutions for the investment industry require event-based backend systems to process high-volume financial data feeds with low latency, high throughput, and guaranteed delivery modes. At vwd we process an average of 18 billion incoming event notifications from 500+ data sources for 30 million symbols per day and peak rates of 1+ million notifications per second using custom-built platforms

    更新日期:2020-10-30
  • Self-Learning Threshold-Based Load Balancing
    arXiv.cs.PF Pub Date : 2020-10-29
    Diego GoldsztajnEindhoven University of Technology; Sem C. BorstEindhoven University of Technology; Johan S. H. van LeeuwaardenTilburg University; Debankur MukherjeeGeorgia Institute of Technology; Philip A. WhitingMacquarie University

    We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The dispatcher uses a threshold for balancing the load and keeping the maximum number of concurrent tasks across server pools low. We demonstrate that such a policy is optimal on the fluid and diffusion scales for a suitable threshold value, while only involving

    更新日期:2020-10-30
  • Experimental Analysis of Communication Relaying Delay in Low-Energy Ad-hoc Networks
    arXiv.cs.PF Pub Date : 2020-10-29
    Taichi Miya; Kohta Ohshima; Yoshiaki Kitaguchi; Katsunori Yamaoka

    In recent years, more and more applications use ad-hoc networks for local M2M communications, but in some cases such as when using WSNs, the software processing delay induced by packets relaying may not be negligible. In this paper, we planned and carried out a delay measurement experiment using Raspberry Pi Zero W. The results demonstrated that, in low-energy ad-hoc networks, processing delay of the

    更新日期:2020-10-30
  • Advanced Python Performance Monitoring with Score-P
    arXiv.cs.PF Pub Date : 2020-10-29
    Andreas Gocht; Robert Schöne; Jan Frenzel

    Within the last years, Python became more prominent in the scientific community and is now used for simulations, machine learning, and data analysis. All these tasks profit from additional compute power offered by parallelism and offloading. In the domain of High Performance Computing (HPC), we can look back to decades of experience exploiting different levels of parallelism on the core, node or inter-node

    更新日期:2020-10-30
  • Measurement-based coexistence studies of LAA & Wi-Fi deployments in Chicago
    arXiv.cs.PF Pub Date : 2020-10-28
    Vanlin Sathya; Muhammad Iqbal Rochman; Monisha Ghosh

    LTE-Licensed Assisted Access (LAA) networks are beginning to be deployed widely in major metropolitan areas in the US in the unlicensed 5 GHz bands, which have existing dense deployments of Wi-Fi as well. Various aspects of the coexistence scenarios such deployments give rise to have been considered ina vast body of academic and industry research. However, there is very little data and research on

    更新日期:2020-10-30
  • EdgeBench: A Workflow-based Benchmark for Edge Computing
    arXiv.cs.PF Pub Date : 2020-10-27
    Qirui Yang; Runyu Jin; Nabil Gandhi; Xiongzi Ge; Hoda Aghaei Khouzani; Ming Zhao

    Edge computing has been developed to utilize multiple tiers of resources for privacy, cost and Quality of Service (QoS) reasons. Edge workloads have the characteristics of data-driven and latency-sensitive. Because of this, edge systems have developed to be both heterogeneous and distributed. The unique characteristics of edge workloads and edge systems have motivated EdgeBench, a workflow-based benchmark

    更新日期:2020-10-30
  • Building a SDN Enterprise WLAN Based on Virtual APs
    arXiv.cs.PF Pub Date : 2020-10-26
    Luis Sequeira; Juan Luis de la Cruz; Jose Ruiz-Mas; Jose Saldana; Julian Fernandez-Navajas; Jose Almodovar

    In this letter the development and testing of an open enterprise Wi-Fi solution based on virtual APs, managed by a central WLAN controller is presented. It allows seamless handovers between APs in different channels, maintaining the QoS of real-time services. The potential scalability issues associated to the beacon generation and channel assignment have been addressed. A battery of tests has been

    更新日期:2020-10-30
  • Enhancing Cloud Storage with Shareable Instances for Social Computing
    arXiv.cs.PF Pub Date : 2020-10-26
    Ying Mao; Peizhao Hu

    Cloud storage plays an important role in social computing. This paper aims to develop a cloud storage management system for mobile devices to support an extended set of file operations. Because of the limit of storage, bandwidth, power consumption, and other resource restrictions, most existing cloud storage apps for smartphones do not keep local copies of files. This efficient design, however, limits

    更新日期:2020-10-30
  • ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems
    arXiv.cs.PF Pub Date : 2020-10-24
    Suresh Nambi; Salim Ullah; Aditya Lohana; Siva Satyendra Sahoo; Farhad Merchant; Akash Kumar

    The recent advances in machine learning, in general, and Artificial Neural Networks (ANN), in particular, has made smart embedded systems an attractive option for a larger number of application areas. However, the high computational complexity, memory footprints, and energy requirements of machine learning models hinder their deployment on resource-constrained embedded systems. Most state-of-the-art

    更新日期:2020-10-30
  • Differentiate Quality of Experience Scheduling for Deep Learning Applications with Docker Containers in the Cloud
    arXiv.cs.PF Pub Date : 2020-10-24
    Ying Mao; Weifeng Yan; Yun Song; Yue Zeng; Ming Chen; Long Cheng; Qingzhi Liu

    With the prevalence of big-data-driven applications, such as face recognition on smartphones and tailored recommendations from Google Ads, we are on the road to a lifestyle with significantly more intelligence than ever before. For example, Aipoly Vision [1] is an object and color recognizer that helps the blind, visually impaired, and color blind understand their surroundings. At the back end side

    更新日期:2020-10-30
  • Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks
    arXiv.cs.PF Pub Date : 2020-10-23
    John Brennan; Stephen Bonner; Amir Atapour-Abarghouei; Philip T Jackson; Boguslaw Obara; Andrew Stephen McGough

    With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often operate over the entire adjacency matrix -- as the input and intermediate network layers are all designed in proportion to the size of the adjacency matrix -- leading

    更新日期:2020-10-30
  • Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios
    arXiv.cs.PF Pub Date : 2020-10-23
    Raúl Nozal; Jose Luis Bosque; Ramon Beivide

    Heterogeneous systems are present from powerful supercomputers, to mobile devices, including desktop computers, thanks to their excellent performance and energy consumption. The ubiquity of these architectures in both desktop systems and medium-sized service servers allow enough variability to exploit a wide range of problems, such as multimedia workloads, video encoding, image filtering and inference

    更新日期:2020-10-30
  • Analysis and Verification of Relation between Digitizer's Sampling Properties and Energy Resolution of HPGe Detectors
    arXiv.cs.PF Pub Date : 2020-10-23
    Jinfu Zhu; Tianhao Wang; Tao Xue; Liangjun Wei; Jingjun Wen; Lin Jiang; Jianmin Li

    The CDEX (China Dark matter Experiment) aims at detection of WIMPs (Weakly Interacting Massive Particles) and 0vbb (Neutrinoless double beta decay) of 76Ge. It now uses ~10 kg HPGe (High Purity Germanium) detectors in CJPL (China Jinping Underground Laboratory). The energy resolution of detectors is calculated via height spectrum of waveforms with 6-us shaping time. It is necessary to know how sampling

    更新日期:2020-10-26
  • Speculative Container Scheduling for Deep Learning Applications in a Kubernetes Cluster
    arXiv.cs.PF Pub Date : 2020-10-21
    Ying Mao; Yuqi Fu; Wenjia Zheng; Long Cheng; Qingzhi Liu; Dingwen Tao

    In the past decade, we have witnessed a dramatically increasing volume of data collected from varied sources. The explosion of data has transformed the world as more information is available for collection and analysis than ever before. To maximize the utilization, various machine and deep learning models have been developed, e.g. CNN [1] and RNN [2], to study data and extract valuable information

    更新日期:2020-10-26
  • Optimising the Performance of Convolutional Neural Networks across Computing Systems using Transfer Learning
    arXiv.cs.PF Pub Date : 2020-10-20
    Rik Mulder; Valentin Radu; Christophe Dubach

    The choice of convolutional routines (primitives) to implement neural networks has a tremendous impact on their inference performance (execution speed) on a given hardware platform. To optimise a neural network by primitive selection, the optimal primitive is identified for each layer of the network. This process requires a lengthy profiling stage, iterating over all the available primitives for each

    更新日期:2020-10-26
  • Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
    arXiv.cs.PF Pub Date : 2020-10-20
    Ying Mao; Yuqi Fu; Suwen Gu; Sudip Vhaduri; Long Cheng; Qingzhi Liu

    Businesses have made increasing adoption and incorporation of cloud technology into internal processes in the last decade. The cloud-based deployment provides on-demand availability without active management. More recently, the concept of cloud-native application has been proposed and represents an invaluable step toward helping organizations develop software faster and update it more frequently to

    更新日期:2020-10-26
  • Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources
    arXiv.cs.PF Pub Date : 2020-10-20
    George Bisbas; Fabio Luporini; Mathias Louboutin; Rhodri Nelson; Gerard Gorman; Paul H. J. Kelly

    Stencil kernels dominate a range of scientific applications including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimisation that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, optimising

    更新日期:2020-10-26
  • Evaluating the Cost of Atomic Operations on Modern Architectures
    arXiv.cs.PF Pub Date : 2020-10-19
    Hermann Schweizer; Maciej Besta; Torsten Hoefler

    Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are ubiquitous in parallel programming. Yet, performance tradeoffs between these operations and various characteristics of such systems, such as the structure of caches, are unclear and have not been thoroughly analyzed. In this paper we establish an evaluation methodology, develop a performance model, and present a set

    更新日期:2020-10-26
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
施普林格,自然编辑
ERIS期刊投稿
欢迎阅读创刊号
自然职场,为您触达千万科研人才
spring&清华大学出版社
城市可持续发展前沿研究专辑
Springer 纳米技术权威期刊征稿
全球视野覆盖
施普林格·自然新
chemistry
物理学研究前沿热点精选期刊推荐
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
屿渡论文,编辑服务
ACS Publications填问卷
阿拉丁试剂right
林亮
南方科技大学
朱守非
华东师范大学
胡少伟
有机所林亮
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
上海纽约大学
浙江大学
廖矿标
天合科研
x-mol收录
试剂库存
down
wechat
bug