• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16
Minseo Kang, Jae-Gil Lee

Spark is one of the most widely used systems for the distributed processing of big data. Its performance bottlenecks are mainly due to the network I/O, disk I/O, and garbage collection. Previous studies quantitatively analyzed the performance impact of these bottlenecks but did not focus on iterative algorithms. In an iterative algorithm, garbage collection has more performance impact than other workloads because the algorithm repeatedly loads and deletes data in the main memory through multiple iterations. Spark provides three caching mechanisms which are “disk cache,” “memory cache,” and “no cache” to keep the unchanged data across iterations. In this paper, we provide an in-depth experimental analysis of the effect of garbage collection on the overall performance depending on the caching mechanisms of Spark with various combinations of algorithms and datasets. The experimental results show that garbage collection accounts for 16–47% of the total elapsed time of running iterative algorithms on Spark and that the memory cache is no less advantageous in terms of garbage collection than the disk cache. We expect the results of this paper to serve as a guide for the tuning of garbage collection in the running of iterative algorithms on Spark.

更新日期：2020-01-17
• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16
Ali Javed, Khalid Mahmood Malik, Aun Irtaza, Hafiz Malik

Abstract Automated approaches to analyze sports video content have been heavily explored in the last few decades to develop more informative and effective solutions for replay detection, shot classification, key-events detection, and summarization. Shot transition detection and classification are commonly applied to perform temporal segmentation for video content analysis. Accurate shot classification is an indispensable requirement to precisely detect the key-events and generate more informative summaries of the sports videos. The current state-of-the-art have several limitations, i.e., use of inflexible game-specific rule-based approaches, high computational cost, dependency on editing effects, game structure, and camera variations, etc. In this paper, we propose an effective decision tree architecture for shot classification of field sports videos to address the aforementioned issues. For this purpose, we employ the combination of low-, mid-, and high-level features to develop an interpretable and computationally efficient decision tree framework for shot classification. Rule-based induction is applied to create various rules using the decision tree to classify the video shots into long, medium, close-up, and out-of-field shots. One of the significant contributions of the proposed work is to find the most reliable rules that are least unpredictable for shot classification. The proposed shot classification method is robust to variations in camera, illumination conditions, game structure, video length, sports genre, broadcasters, etc. Performance of our method is evaluated on YouTube dataset of three different genre of sports that is diverse in terms of length, quantity, broadcasters, camera variations, editing effects and illumination conditions. The proposed method provides superior shot classification performance and achieves an average improvement of 6.9% in precision and 9.1% in recall as compared to contemporary methods under above-mentioned limitations.

更新日期：2020-01-17
• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16

MapReduce framework is an effective method for big data parallel processing. Enhancing the performance of MapReduce clusters, along with reducing their job execution time, is a fundamental challenge to this approach. In fact, one is faced with two challenges here: how to maximize the execution overlap between jobs and how to create an optimum job scheduling. Accordingly, one of the most critical challenges to achieving these goals is developing a precise model to estimate the job execution time due to the large number and high volume of the submitted jobs, limited consumable resources, and the need for proper Hadoop configuration. This paper presents a model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster. Moreover, a novel heuristic method is designed, which significantly reduces the makespan of the jobs. In this method, first by providing the job profiling tool, we obtain the execution details of the MapReduce phases through log analysis. Then, using machine learning methods and statistical analysis, we propose a relevant model to predict runtime. Finally, another tool called job submission and monitoring tool is used for calculating makespan. Different experiments were conducted on the benchmarks under identical conditions for all jobs. The results show that the average makespan speedup for the proposed method was higher than an unoptimized case.

更新日期：2020-01-17
• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16
Xingquan Li, Cong Cao, Tao Zhang

Abstract Clustering or partition is a fundamental work for graph or network. Detecting communities is a typical clustering, which divides a network into several parts according to the modularity. Community detection is a critical challenge for designing scalable, adaptive and survivable trust management protocol for a community of interest-based social IoT system. Most of the existed methods on community detection suffer from a common issue that the number of communities should be prior decided. This urges us to estimate the number of communities from the data by some way. This paper concurrently considers eliminating the number of communities and detecting communities based on block diagonal dominace adjacency matrix. To construct a block diagonal dominance adjacency matrix for the input network, it first reorders the node number by the breadth-first search algorithm. For the block diagonal dominance adjacency matrix, this paper shows that the numbers of nodes in a community should be continuous adjacent. And thus, it only needs insert some breakpoints in node number sequence to decide the number of communities and the nodes in every community. In addition, a dynamic programming algorithm is designed to achieve an optimal community detection result. Experimental results on a number of real-world networks show the effectiveness of the dynamic programming approach on the community detection problem.

更新日期：2020-01-17
• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16
Fatemeh Safara, Alireza Souri, Thar Baker, Ismaeel Al Ridhawi, Moayad Aloqaily

Abstract The Internet of Things (IoT) devices gather a plethora of data by sensing and monitoring the surrounding environment. Transmission of collected data from the IoT devices to the cloud through relay nodes is one of the many challenges that arise from IoT systems. Fault tolerance, security, energy consumption and load balancing are all examples of issues revolving around data transmissions. This paper focuses on energy consumption, where a priority-based and energy-efficient routing (PriNergy) method is proposed. The method is based on the routing protocol for low-power and lossy network (RPL) model, which determines routing through contents. Each network slot uses timing patterns when sending data to the destination, while considering network traffic, audio and image data. This technique increases the robustness of the routing protocol and ultimately prevents congestion. Experimental results demonstrate that the proposed PriNergy method reduces overhead on the mesh, end-to-end delay and energy consumption. Moreover, it outperforms one of the most successful routing methods in an IoT environment, namely the quality of service RPL (QRPL).

更新日期：2020-01-17
• Theor. Comput. Sci. (IF 0.718) Pub Date : 2019-07-04
Gerold Jäger; Frank Drewes

In this work we determine the metric dimension of Zn×Zn×Zn as ⌊3n/2⌋ for all n≥2. We prove this result by investigating a variant of Mastermind. Mastermind is a famous two-player game that has attracted much attention in the literature in recent years. In particular we consider the static (also called non-adaptive) black-peg variant of Mastermind. The game is played by a codemaker and a codebreaker. Given c colors and p pegs, the principal rule is that the codemaker has to choose a secret by assigning colors to the pegs, i.e., the secret is a p-tuple of colors, and the codebreaker asks a number of questions all at once. Like the secret, a question is a p-tuple of colors chosen from the c available colors. The codemaker then answers all of those questions by telling the codebreaker how many pegs in each question are correctly colored. The goal is to find the minimal number of questions that allows the codebreaker to determine the secret from the received answers. We present such a strategy for this game for p=3 pegs and an arbitrary number c≥2 of colors using ⌊3c/2⌋+1 questions, which we prove to be both feasible and optimal. The minimal number of questions required for p pegs and c colors is easily seen to be equal to the metric dimension of Zcp plus 1 which proves our main result.

更新日期：2020-01-17
• Theor. Comput. Sci. (IF 0.718) Pub Date : 2020-01-16
Titus H. Klinge; James I. Lathrop; Jack H. Lutz

We present a uniform method for translating an arbitrary nondeterministic finite automaton (NFA) into a deterministic mass action input/output chemical reaction network (I/O CRN) that simulates it. The I/O CRN receives its input as a continuous time signal consisting of concentrations of chemical species that vary to represent the NFA's input string in a natural way. The I/O CRN exploits the inherent parallelism of chemical kinetics to simulate the NFA in real time with a number of chemical species that is linear in the size of the NFA. We prove that the simulation is correct and that it is robust with respect to perturbations of the input signal, the initial concentrations of species, the output (decision), and the rate constants of the reactions of the I/O CRN.

更新日期：2020-01-17
• Theor. Comput. Sci. (IF 0.718) Pub Date : 2020-01-16
Benjamin Doerr; Timo Kötzing; J.A. Gregor Lagodzinski; Johannes Lengler

While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat. In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism. More specifically, we analyzed the performance of the (1+1) GP on the two benchmark functions Order and Majority. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of O(Tinit+nlog⁡n) iterations both for Order and Majority. For the case without bloat control, the bounds O(Tinitlog⁡Tinit+n(log⁡n)3) and Ω(Tinit+nlog⁡n) (and Ω(Tinitlog⁡Tinit) for n=1) hold for Majority.1

更新日期：2020-01-17
• Theor. Comput. Sci. (IF 0.718) Pub Date : 2020-01-16
Nikolay Vereshchagin

The purpose of this paper is to answer two questions left open in [B. Durand, A. Shen, and N. Vereshchagin, Descriptive Complexity of Computable Sequences, Theoretical Computer Science 171 (2001), pp. 47–58]. Namely, we consider the following two complexities of an infinite computable 0-1-sequence α: C0′(α), defined as the minimal length of a program with oracle 0′ that prints α, and M∞(α), defined as limsupC(α1:n|n), where α1:n denotes the length-n prefix of α and C(x|y) stands for conditional Kolmogorov complexity. We show that C0′(α)⩽M∞(α)+O(1) and M∞(α) is not bounded by any computable function of C0′(α), even on the domain of computable sequences.

更新日期：2020-01-17
• Inform. Syst. (IF 2.066) Pub Date : 2020-01-17

更新日期：2020-01-17
• Inform. Syst. (IF 2.066) Pub Date : 2020-01-17
Rutian Liu; Eric Simon; Bernd Amann; Stéphane Gançarski

The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, despite the profusion of available datasets, it remains quite difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. This article describes a model and algorithms that exploit automatically extracted and user-defined semantic relationships for extending analytic datasets with new atomic or aggregated attribute values. Our framework is implemented as a REST service in SAP HANA and includes a careful theoretical analysis and practical solutions for several complex data quality issues.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-10-25
German Maglione-Mathey; Jesus Escudero-Sahuquillo; Pedro Javier Garcia; Francisco J. Quiles; José Duato

The number of endnodes in high-performance computing and datacenter systems is constantly increasing. Hence, it is crucial to minimize the impact of network congestion to guarantee a suitable network performance. InfiniBand is a prominent interconnect technology that allows implementing efficient topologies and routing algorithms, as well as queuing schemes that reduce the head-of-line (HoL) blocking effect derived from congestion situations. Here, we explain and evaluate thoroughly a queuing scheme called Path2SL that optimizes the use of the InfiniBand Virtual Lanes to reduce the HoL blocking in fat-tree network topologies.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-10-30

Multichiplet system-in-package designs have recently received a lot of attention as a mechanism to combat high SoC design costs and to economically manufacture large ASICs. These designs require low-power area-efficient off-die on-package die-to-die communication. Current technologies either extend on-die high-wire count buses using silicon interposers or off-package serial buses. The former approach leads to expensive packaging. The latter leads to complex and high-power designs. We propose a simple bunch-of-wires interface that combines ease of development with low-cost packaging techniques. We develop the interface and show how it can be used in multichiplet systems.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-10-31
Joshua Lant; Javier Navaridas; Mikel Luján; John Goodacre

HPC architects are currently facing myriad challenges from ever tighter power constraints and changing workload characteristics. In this article, we discuss the current state of FPGAs within HPC systems. Recent technological advances show that they are well placed for penetration into the HPC market. However, there are still a number of research problems to overcome; we address the requirements for system architectures and interconnects to enable their proper exploitation, highlighting the necessity of allowing FPGAs to act as full-fledged peers within a distributed system rather than attached to the CPU. We argue that this model requires a reliable, connectionless, hardware-offloaded transport supporting a global memory space. Our results show how our fully fledged hardware implementation gives latency improvements of up to 25% versus a software-based transport, and demonstrates that our solution can outperform the state of the art in HPC workloads such as matrix–matrix multiplication achieving a 10% higher computing throughput.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-10-30
Ammar Ahmad Awan; Arpan Jain; Ching-Hsiang Chu; Hari Subramoni; Dhableswar K. Panda

Heterogeneous high-performance computing systems with GPUs are equipped with high-performance interconnects like InfiniBand, Omni-Path, PCIe, and NVLink. However, little exists in the literature that captures the performance impact of these interconnects on distributed deep learning (DL). In this article, we choose Horovod, a distributed training middleware, to analyze and profile various DNN training workloads using TensorFlow and PyTorch in addition to standard MPI microbenchmarks. We use a wide variety of systems with CPUs like Intel Xeon and IBM POWER9, GPUs like Volta V100, and various interconnects to analyze the following metrics: 1) message-size with Horovod's tensor-fusion; 2) message-size without tensor-fusion; 3) number of MPI/NCCL calls; and 4) time taken by each MPI/NCCL call. We observed extreme performance variations for non-power-of-two message sizes on different platforms. To address this, we design a message-padding scheme for Horovod, illustrate significantly smoother allreduce latency profiles, and report cases where we observed improvement for end-to-end training.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-10-30
John Gliksberg; Antoine Capra; Alexandre Louvet; Pedro Javier García; Devan Sohier

Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of supercomputers. In this article, we present Dmodc, a fast deterministic routing algorithm for parallel generalized fat trees (PGFTs), which minimizes congestion risk even under massive network degradation caused by equipment failure. Dmodc computes forwarding tables with a closed-form arithmetic formula by relying on a fast preprocessing phase. This allows complete rerouting of networks with tens of thousands of nodes in less than a second. In turn, this greatly helps centralized fabric management react to faults with high-quality routing tables and has no impact on running applications in current and future very large scale high-performance computing clusters.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-12-10
Sourav Roy; Arvind Kaushik; Rajkumar Agrawal; Joseph Gergen; Wim Rouwet; John Arends

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2020-01-14
Phillip Stanley-Marbell; Martin Rinard

In this article, we present Warp, the first open hardware platform designed explicitly to support research in approximate computing. Warp incorporates 21 sensors, computation, and circuit-level facilities designed explicitly to enable approximate computing research, in a 3.6 cm × 3.3 cm × 0.5 cm device. Warp supports a wide range of precision and accuracy versus power and performance tradeoffs.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-10-21
Hoda Mahdiani; Alireza Khadem; Azam Ghanbari; Mehdi Modarressi; Farima Fattahi-Bayat; Masoud Daneshtalab

The enormous and ever-increasing complexity of state-of-the-art neural networks has impeded the deployment of deep learning on resource-limited embedded and mobile devices. To reduce the complexity of neural networks, this article presents $\Delta$ΔNN, a power-efficient architecture that leverages a combination of the approximate value locality of neuron weights and algorithmic structure of neural networks. $\Delta$ΔNN keeps each weight as its difference ($\Delta$Δ) to the nearest smaller weight: each weight reuses the calculations of the smaller weight, followed by a calculation on the $\Delta$Δ value to make up the difference. We also round up/down the $\Delta$Δ to the closest power of two numbers to further reduce complexity. The experimental results show that $\Delta$ΔNN boosts the average performance by 14%–37% and reduces the average power consumption by 17%–49% over some state-of-the-art neural network designs.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-11-12
Han Cai; Ji Lin; Yujun Lin; Zhijian Liu; Kuan Wang; Tianzhe Wang; Ligeng Zhu; Song Han

Efficient deep learning inference requires algorithm and hardware codesign to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the neural architecture design makes the design space much larger: it is not only about designing the hardware architecture but also codesigning the neural architecture to fit the hardware architecture. It is difficult for human engineers to exhaust the design space by heuristics. We propose design automation techniques for architecting efficient neural networks given a target hardware platform. We investigate automatically designing specialized and fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate that such learning-based, automated design achieves superior performance and efficiency than the rule-based human design. Moreover, we shorten the design cycle by 200× than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

更新日期：2020-01-17
• IEEE Micro (IF 2.570) Pub Date : 2019-11-22
Masab Ahmad; Halit Dogan; José A. Joao; Omer Khan

In this article, the moving computation to data model (MC2D) is proposed to accelerate thread synchronization by pinning shared data to dedicated cores, and utilize in-hardware core-to-core messaging to communicate critical code execution. The MC2D model optimizes shared data locality by eliminating unnecessary data movement, and alleviates contended synchronization using nonblocking communication between threads. This article evaluates task-parallel algorithms under their synchronization-centric classification to demonstrate that the effectiveness of the MC2D model to exploit performance correlates with the number and frequency of synchronizations. The evaluation on Tilera TILE-Gx72 multicore shows that the MC2D model delivers highest performance scaling gains for ordered and unordered algorithms that expose significant synchronizations due to task and data level dependencies. The MC2D model is also shown to deliver at par performance with the traditional atomic operations based model for highly data parallel algorithms from the unordered category.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Nir Kshetri

Several blockchain-based financial technologies and cryptocurrencies have been launched for low-income people. Blockchain?s technical potential can be used to serve the needs of unbanked and underbanked populations, but there is no evidence that these needs are being met.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Celia Paulsen

Most data breaches originate in the supply chain. This article seeks to identify policy, technology, and business environment changes that are shaping how organizations will source, buy, build, deliver, dispose of, and ultimately protect IT/operational technology goods and services in the next decade.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Rick Kuhn; Raghu N. Kacker; Yu Lei; Dimitris Simos

Testing is the most commonly used approach for software assurance, yet it remains as much judgment and art as science. We suggest that structural coverage measures must be supplemented with measures of input space coverage, providing a means of verifying that an adequate input model has been defined.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Loren E. Peitso; James Bret Michael

Augmented reality (AR) is a game-changing technology that lets users see things they cannot otherwise see. Shared reality could be used, among other applications, to improve the safety of traffic systems. Despite current limitations, the future is bright for interactive shared AR.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Munindar P. Singh; Amit K. Chopra

We propose a sociotechnical, yet computational, approach to building decentralized applications that accommodates and exploits blockchain technology. Our architecture incorporates the notion of a declarative, violable contract and enables flexible governance based on formal organizational structures, correctness verification without obstructing autonomy, and a basis for trust.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Simo Johannes Hosio; Niels van Berkel; Jonas Oppenlaender; Jorge Goncalves

The Diet Explorer is a lightweight system that relies on aggregated human insights for assessing and recommending suitable weight loss diets. We compared its performance against Google and suggest that the system, bootstrapped using a public crowdsourcing platform, provides comparable results in terms of overall satisfaction, relevance, and trustworthiness.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Arndt Lueder

New IT and computer science solutions are needed to ensure an increased flexibility of production systems. One key is using intelligent and self-responsible production system components, the Industry 4.0 components. In this column, relevant requirements, research and development trends, and issues still to be addressed are presented.

更新日期：2020-01-17
• Computer (IF 3.564) Pub Date : 2020-01-15
Nir Kshetri; Jeffrey Voas

The European Union General Data Protection Regulation (GDPR) has introduced online privacy and transparency for consumers as well as legal considerations that companies must address.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-09-27
Minsik Oh; Kwangsu Kim; Duheon Choi; Hyuk-Jun Lee; Eui-Young Chung

Recently, a hybrid cache consisting of SRAM and STT-RAM has attracted much attention as a future memory by complementing each other with different memory characteristics. Prior works focused on developing data allocation and migration techniques considering write-intensity to reduce write energy at STT-RAM. However, these works often neglect the impact of operation-specific reusability of a cache line. In this paper, we propose an energy-efficient per-operation reusability-based allocation and migration policy (ORAM) with a unified LRU replacement policy. First, to select an adequate memory type for allocation, we propose a cost function based on per-operation reusability – gain from an allocated cache line and loss from an evicted cache line for different memory types – which exploits the temporal locality. Besides, we present a migration policy, victim and target cache line selection scheme, to resolve memory type inconsistency between replacement policy and the allocation policy, with further energy reduction. Experiment results show an average energy reduction in the LLC and the main memory by 12.3 and 21.2 percent, and the improvement of latency and execution time by 21.2 and 8.8 percent, respectively, compared with a baseline hybrid cache management. In addition, the Energy-Delay Product (EDP) is improved by 36.9 percent over the baseline.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-04
Shinobu Miwa; Masaya Ishihara; Hayato Yamaki; Hiroki Honda; Martin Schulz

Power-efficiency has become one of the most critical concerns for HPC as we continue to scale computational capabilities. A significant fraction of system power is spent on large main memories, mainly caused by the substantial amount of DIMM standby power needed. However, while necessary for some workloads, for many workloads large memory configurations are too rich, i.e., these workloads only make use of a fraction of the available memory, causing unnecessary power usage. This observation opens new opportunities for power reduction by powering DIMMs on and off depending on the current workload. In this article, we propose footprint-based DIMM hotplug that enables a compute node to adjust the number of DIMMs that are powered on depending on the memory footprint of a running job. Our technique relies on two main subcomponents—memory footprint monitoring and DIMM management—which we both implement as part of an optimized page management system with small control overhead. Using Linux's memory hotplug capabilities, we implement our approach on a real system, and our results show that our proposed technique can save 50.6–52.1 percent of the DIMM standby energy and the CPU+DRAM energy of up to 1.50 Wh for various small-memory-footprint applications without loss of performance.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-04
Amit Kumar Singh; Karunakar Reddy Basireddy; Alok Prakash; Geoff V. Merrett; Bashir M. Al-Hashimi

Heterogeneous Mobile System-on-Chips (SoCs) containing CPU and GPU cores are becoming prevalent in embedded computing, and they need to execute applications concurrently. However, existing run-time management approaches do not perform adaptive mapping and thread-partitioning of applications while exploiting both CPU and GPU cores at the same time. In this paper, we propose an adaptive mapping and thread-partitioning approach for energy-efficient execution of concurrent OpenCL applications on both CPU and GPU cores while satisfying performance requirements. To start execution of concurrent applications, the approach makes mapping (number of cores and operating frequencies) and partitioning (distribution of threads between CPU and GPU) decisions to satisfy performance requirements for each application. The mapping and partitioning decisions are made by having a collaboration between the CPU and GPU cores’ processing capabilities such that balanced execution can be performed. During execution, adaptation is triggered when new application(s) arrive, or an executing one finishes, that frees cores. The adaptation process identifies a new mapping and thread-partitioning in a similar collaborative manner for remaining applications provided it leads to an improvement in energy efficiency. The proposed approach is experimentally validated on the Odroid-XU3 hardware platform with varying set of applications. Results show an average energy saving of 37%, compared to existing approaches while satisfying the performance requirements.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-07
Johannes Bund; Christoph Lenzen; Moti Medina

Friedrichs et al. (TC 2018) showed that metastability can be contained when sorting inputs arising from time-to-digital converters, i.e., measurement values can be correctly sorted without resolving metastability using synchronizers first. However, this work left open whether this can be done by small circuits. We show that this is indeed possible, by providing a circuit that sorts Gray code inputs (possibly containing a metastable bit) and has asymptotically optimal depth and size. Our solution utilizes the parallel prefix computation (PPC) framework (JACM 1980). We improve this construction by bounding its fan-out by an arbitrary $f\geq 3$f≥3 , without affecting depth and increasing circuit size by a small constant factor only. Thus, we obtain the first PPC circuits with asymptotically optimal size, constant fan-out, and optimal depth. To show that applying the PPC framework to the sorting task is feasible, we prove that the latter can, despite potential metastability, be decomposed such that the core operation is associative. We obtain asymptotically optimal metastability-containing sorting networks. We complement these results with simulations, independently verifying the correctness as well as small size and delay of our circuits. Proofs are omitted in this version; the article with full proofs is provided online at http://arxiv.org/abs/1911.00267 .

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-08
Shuibing He; Yanlong Yin; Xian-He Sun; Xuechen Zhang; Zongpeng Li

As the performance gap between processors and storage devices keeps increasing, I/O performance becomes a critical bottleneck of modern high-performance computing systems. In this paper, we propose a pattern-directed and layout-aware data replication design, named PDLA, to improve the performance of parallel I/O systems. PDLA includes an HDD-based scheme H-PDLA and an SSD-based scheme S-PDLA . For applications with relatively low I/O concurrency, H-PDLA identifies access patterns of applications and makes a reorganized data replica for each access pattern on HDD-based servers with an optimized data layout. Moreover, to accommodate applications with high I/O concurrency, S-PDLA replicates critical access patterns that can bring performance benefits on SSD-based servers or on HDD-based and SSD-based servers. We have implemented the proposed replication scheme under MPICH2 library on top of OrangeFS file system. Experimental results show that H-PDLA can significantly improve the original parallel I/O system performance and demonstrate the advantages of S-PDLA over H-PDLA.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-11
Xiayang Wang; Fuqian Huang; Haibo Chen

Attackers often exploit memory corruption vulnerabilities to overwrite control data and further gain control over victim applications. Despite progress in advanced defensive techniques, such attacks still remain a major security threat. In this article, we present Niffler, a new technique that provides lightweight and practical defense against such attacks. Niffler eliminates the threat of memory corruption over control data by cloaking all control data in registers along its execution and only spilling them into a dedicated read-only area in memory upon a shortage of registers. As an attacker cannot directly overwrite any register or read-only memory pages, no direct memory corruption on control data is feasible. Niffler is made efficient by compactly encoding return address, balancing register allocation, dynamically determining register spilling and leveraging the recent Intel Memory Protection Extensions (MPX) for control data lookup during register restoring. We implement Niffler based on LLVM and conduct a set of evaluations on SPECCPU 2006 and real-world applications. Performance evaluation shows that Niffler introduces an average of only 6.3 percent overhead on SPECCPU 2006 C programs and an average of 28.2 percent overhead on C++ programs.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-11
Qianqian Fan; David J. Lilja; Sachin S. Sapatnekar

In the past few years, ever-increasing amounts of image data have been generated by users globally, and these images are routinely stored in cold storage systems in compressed formats. This article investigates the use of approximate storage that leverages the use of cheaper, lower reliability memories that can have higher error rates. Since traditional JPEG-based schemes based on variable-length coding are extremely sensitive to error, the direct use of approximate storage results in severe quality degradation. We propose an error-resilient adaptive-length coding (ALC) scheme that divides all symbols into two classes, based on their frequency of occurrence, where each class has a fixed-length codeword. This provides a balance between the reliability of fixed-length coding schemes, which have a high storage overhead, and the storage-efficiency of Huffman coding schemes, which show high levels of error on low-reliability storage platforms. Further, we use data partitioning to determine which bits are stored in approximate or reliable storage to lower the overall cost of storage. We show that ALC can be used with general non-volatile storage, and can substantially reduce the total cost compared to traditional JPEG-based storage.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-15
Abhishek Das; Nur A. Touba

With technology scaling, burst errors or clustered errors are becoming increasingly common in different types of memories. Multiple bit upsets due to particle strikes, write disturbance errors, and magnetic field coupling are a few of the mechanisms which cause clustered errors. In this article, a new class of single burst error correcting codes are presented which correct a single burst of any size b within a codeword. A code construction methodology is presented which enables us to construct the proposed scheme from existing codes, e.g., Hamming codes. A new single step decoding methodology for the proposed class of codes is also presented which enables faster decoding. Different code constructions using Hamming codes, and BCH codes have been presented in this paper and a comparison is made with existing schemes in terms of decoding complexity and data redundancy. The proposed scheme in all cases reduces the decoder complexity for little to no increase in data redundancy, specifically for higher burst error sizes.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-16
Kyuhwa Han; Hyukjoong Kim; Dongkun Shin

Recent advances in flash memory technology have reduced the cost-per-bit of flash storage devices such as solid-state drives (SSDs), thereby enabling the development of large-capacity SSDs for enterprise-scale storage. However, two major concerns arise in designing SSDs. First, the size of the address mapping table is increasing in proportion to the capacity of the SSD. The SSD-internal firmware, called flash translation layer (FTL), must maintain the address mapping table in the internal DRAM. Although the previously proposed demand map loading technique uses a small size of cached map table, the technique aggravates poor random performance. Second, there are many redundant writes in storage workloads, which have an adverse effect on the performance and lifetime of the SSD. For example, many transaction-supporting applications use the write-ahead-log (WAL) scheme, which writes the same data twice. To resolve these problems, we propose a novel transaction-supporting SSD, called WAL-SSD, which logs transaction data at the internally-managed WAL area and relocates the data atomically via the FTL-level remap operation at the transaction checkpointing. It can also be used to transform random write requests to sequential requests. We implemented a prototype of WAL-SSD with a real SSD device. Experiments demonstrate the performance improvement by WAL-SSD with three use cases: remap-journaling, atomic multi-block update, and random write logging.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-16
Javier D. Bruguera

Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. We present a floating-point division and square root unit, which implements a radix-64 floating-point division and a radix-16 floating-point square root. To have an affordable implementation, each radix-64 division iteration and radix-16 square root iteration are made of simpler radix-4 iterations: 3 radix-4 iterations in division and 2 in square root. Speculation is used between consecutive radix-4 iterations to get a reduced timing. There are three different parts in digit-recurrence implementations: initialization, digit iterations, and rounding. The digit iteration is the iterative part and it uses the same logic for several cycles. Division and square root share partially the initialization and rounding stages, whereas each one has different logic for the digit iterations. The result is a low-latency floating-point divider and square root, requiring 11, 6, and 4 cycles for double, single and half-precision division with normalized operands and result, and 15, 8 and 5 cycles for square root. One or two additional cycles are needed in case of subnormal operand(s) or result.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-17
Cheng Chen; Qingsong Wei; Weng-Fai Wong; Chundong Wang

Modern file systems rely on the journaling mechanism to maintain crash consistency. The use of non-volatile memory (NVM) significantly improves the performance of journaling file systems. However, the superior performance of NVM will increase the likelihood of the journal filling up more often, thereby increasing the frequency of checkpointing. Together with the large amount of random checkpointing I/O found in most use cases, the checkpointing process becomes a new performance bottleneck. This paper proposes NV-Journaling, a strategy that reduces the frequency of checkpointing as well as reshapes the I/O pattern of checkpointing from one of random I/O to that which is more sequential I/O. NV-Journaling introduces fine-grained commits along with a cache-friendly NVM journaling layout that exploits the idiosyncrasies of NVM technology. Under this scheme, only the modified portion of a block, rather than the entire block, is written into the NVM journal device. Doing so significantly reduces checkpoint frequency and achieves better space utilization. NV-Journaling further reshapes the I/O pattern of checkpoint using a locality-aware checkpointing process. Checkpointed blocks are classified into hot and cold blocks. NV-Journaling maintains a hot block list to absorb repeated updates, and a cold bucket list to group blocks by their proximity on disk. When a checkpoint is required, cold buckets are selected such that blocks are sequentially flushed to the hard disk. We built a prototype of NV-Journaling by modifying the JBD2 layer in the Linux kernel and evaluated it using different workloads. Our experimental results show that NV-Journaling can improve performance by up to 4.3× compared to traditional journaling.

更新日期：2020-01-17
• IEEE Trans. Comput. (IF 3.131) Pub Date : 2019-10-23
Savvas Varsamopoulos; Koen Bertels; Carmen Garcia Almudever

Matching algorithms can be used for identifying errors in quantum systems, being the most famous the Blossom algorithm. Recent works have shown that small distance quantum error correction codes can be efficiently decoded by employing machine learning techniques based on neural networks (NN). Various NN-based decoders have been proposed to enhance the decoding performance and the decoding time. Their implementation differs in how the decoding is performed, at logical or physical level, as well as in several neural network related parameters. In this work, we implement and compare two NN-based decoders, a low level decoder and a high level decoder, and study how different NN parameters affect their decoding performance and execution time. Crucial parameters such as the size of the training dataset, the structure and the type of the neural network, and the learning rate used during training are discussed. After performing this comparison, we conclude that the high level decoder based on a Recurrent NN shows a better balance between decoding performance and execution time and it is much easier to train. We then test its decoding performance for different code distances, probability datasets and under the depolarizing and circuit error models.

更新日期：2020-01-17
• arXiv.cs.LO Pub Date : 2020-01-16
Patrick Rodler

We challenge existing query-based ontology fault localization methods wrt. assumptions they make, criteria they optimize, and interaction means they use. We find that their efficiency depends largely on the behavior of the interacting expert, that performed calculations can be inefficient or imprecise, and that used optimization criteria are often not fully realistic. As a remedy, we suggest a novel (and simpler) interaction approach which overcomes all identified problems and, in comprehensive experiments on faulty real-world ontologies, enables a successful fault localization while requiring fewer expert interactions in 66 % of the cases, and always at least 80 % less expert waiting time, compared to existing methods.

更新日期：2020-01-17
• arXiv.cs.LO Pub Date : 2020-01-16
E. M. Hahn; M. Perez; S. Schewe; F. Somenzi; A. Trivedi; D. Wojtczak

Recently, successful approaches have been made to exploit good-for-MDPs automata (B\"uchi automata with a restricted form of nondeterminism) for model free reinforcement learning, a class of automata that subsumes good for games automata and the most widespread class of limit deterministic automata. The foundation of using these B\"uchi automata is that the B\"uchi condition can, for good-for-MDP automata, be translated to reachability. The drawback of this translation is that the rewards are, on average, reaped very late, which requires long episodes during the learning process. We devise a new reward shaping approach that overcomes this issue. We show that the resulting model is equivalent to a discounted payoff objective with a biased discount that simplifies and improves on prior work in this direction.

更新日期：2020-01-17
• arXiv.cs.LO Pub Date : 2018-04-06
Sean Tull

We reconstruct finite-dimensional quantum theory from categorical principles. That is, we provide properties ensuring that a given physical theory described by a dagger compact category in which one may `discard' objects is equivalent to a generalised finite-dimensional quantum theory over a suitable ring $S$. The principles used resemble those due to Chiribella, D'Ariano and Perinotti. Unlike previous reconstructions, our axioms and proof are fully categorical in nature, in particular not requiring tomography assumptions. Specialising the result to probabilistic theories we obtain either traditional quantum theory with $S$ being the complex numbers, or that over real Hilbert spaces with $S$ being the reals.

更新日期：2020-01-17
• arXiv.cs.LO Pub Date : 2019-08-02
Saket Dingliwal; Ronak Agarwal; Happy Mittal; Parag Singla

Symmetry breaking is a popular technique to reduce the search space for SAT solving by exploiting the underlying symmetry over variables and clauses in a formula. The key idea is to first identify sets of assignments which fall in the same symmetry class, and then impose ordering constraints, called Symmetry Breaking Predicates (SBPs), such that only one (or a small subset) of these assignments is allowed to be a solution of the original SAT formula. While this technique has been exploited extensively in the SAT literature, there is little work on using symmetry breaking for SAT Modulo Theories (SMT). In SMT, logical constraints in SAT theories are combined with another set of theory operations defined over non-Boolean variables such as integers, reals, etc. SMT solvers typically use a combination of SAT solving techniques augmented with calls to the theory solver. In this work, we take up the advances in SAT symmetry breaking and apply them to the domain of SMT. Our key technical contribution is the formulation of symmetry breaking over the Boolean skeleton variables, which are placeholders for actual theory operations in SMT solving. These SBPs are then applied over the SAT solving part of the SMT solver. We implement our SBP ideas on top of CVC4, which is a state-of-the-art SMT solver. Our approach can result in significantly faster solutions on several benchmark problems compared to the state-of-the-art. Our final solver is a hybrid of the original CVC4 solver, and an SBP based solver, and can solve up to 3.8% and 3.1% more problems in the QF_NIA category of 2018 and 2019 SMT benchmarks, respectively, compared to CVC4, the top performer in this category.

更新日期：2020-01-17
• arXiv.cs.GT Pub Date : 2020-01-16
Deepanshu Vasal; Rajesh K Mishra; Sriram Vishwanath

In this paper, we present a sequential decomposition algorithm to compute graphon mean field equilibrium (GMFE) of dynamic graphon mean field game (GMFG). We consider a large population of players sequentially making strategic decisions where the actions of each player affect their neighbors which is captured in a graph, generated by a known graphon. Each player observes a private state and also a common information as a graphon mean-field population state which represents the empirical networked distribution of other players' types. We consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute GMFE that depend on both, a player's private type, and the current (dynamic) population state determined through the graphon. Each step in this algorithm consists of solving a fixed-point equation. We provide conditions on model parameters for which there exists such a GMFE. Using this algorithm, we obtain the GMFE for a specific security setup in cyber physical systems for different graphons that capture the interactions between the nodes in the system.

更新日期：2020-01-17
• arXiv.cs.GT Pub Date : 2020-01-16
Shivika Narang; Y Narahari

Stable matchings have been studied extensively in both economics and computer science literature. However, most of the work considers only integral matchings. The study of stable fractional matchings is fairly recent and moreover, is scarce. This paper reports the first investigation into the important but unexplored topic of incentive compatibility of matching mechanisms to find stable fractional matchings. We focus our attention on matching instances under strict preferences. First, we make the significant observation that there are matching instances for which no mechanism that produces a stable fractional matching is incentive compatible. We then characterize restricted settings of matching instances admitting unique stable fractional matchings. Specifically, we show that there will exist a unique stable fractional matching for a matching instance if and only if the given matching instance satisfies what we call the conditional mutual first preference property (CMFP). For this class of instances, we prove that every mechanism that produces the unique stable fractional matching is (a) incentive compatible and (b) resistant to coalitional manipulations. We provide a polynomial-time algorithm to compute the stable fractional matching as well. The algorithm uses envy-graphs, hitherto unused in the study of stable matchings.

更新日期：2020-01-17
• arXiv.cs.GT Pub Date : 2020-01-16
Shivika Narang

The blockchain concept forms the backbone of a new wave technology that promises to be deployed extensively in a wide variety of industrial and societal applications. Governments, financial institutions, banks, industrial supply chains, service companies, and even educational institutions and hospitals are investing in a substantial manner in the hope of improving business efficiency and operational robustness through deployment of blockchain technology. This thesis work is concerned with designing trustworthy business-to-business (B2B) market platforms drawing upon blockchain technology and game theory. The proposed platform is built upon three key ideas. First, we use permissioned blockchains with smart contracts as a technically sound approach for building the B2B platform. The blockchain deploys smart contracts that govern the interactions of enterprise buyers and sellers. Second, the smart contracts are designed using a rigorous analysis of a repeated game model of the strategic interactions between buyers and sellers. We show that such smart contracts induce honest behavior from buyers and sellers. Third, we embed cryptographic regulation protocols into the permissioned blockchain to ensure that business sensitive information is not revealed to the competitors. We believe our work is an important step in the direction of building a powerful B2B platform that maximizes social welfare and enables trusted collaboration between strategic enterprise agents.

更新日期：2020-01-17
• arXiv.cs.GT Pub Date : 2020-01-16
Chuang-Chieh Lin; Chi-Jen Lu; Po-An Chen

In this paper, we propose a simple and intuitive model to investigate the efficiency of the two-party election system, especially regarding the nomination process. Each of the two parties has its own candidates, and each of them brings utilities for the people including the supporters and non-supporters. In an election, each party nominates exactly one of its candidates to compete against the other party's. The candidate wins the election with higher odds if he or she brings more utility for all the people. We model such competition as a "two-party election game" such that each party is a player with two or more pure strategies corresponding to its potential candidates, and the payoff of each party is a mixed utility from a selected pair of competing candidates. By looking into the three models, namely, the linear link, Bradley-Terry, and the softmax models, which differ in how to formulate a candidate's winning odds against the competing candidate, we show that the two-party election game may neither have any pure Nash equilibrium nor a bounded price of anarchy. Nevertheless, by considering the conventional "egoism", which states that any candidate benefits his/her party's supporters more than any candidate from the competing party does, we prove that the two-party election game in both the linear link model and the softmax model always has pure Nash equilibria, and furthermore, the price of anarchy is constantly bounded.

更新日期：2020-01-17
• arXiv.cs.GT Pub Date : 2020-01-16
Nikoleta E. Glynatsi; Vincent A. Knight

The Iterated Prisoner's Dilemma has been used for decades as a model of behavioural interactions. From the celebrated performance of Tit for Tat, to the introduction of the zero-determinant strategies, to the use of sophisticated structures such as neural networks, the literature has been exploring the performance of strategies in the game for years. The results of the literature, however, have been relying on the performance of specific strategies in a finite number of tournaments. This manuscript evaluates 195 strategies' effectiveness in more than 40000 tournaments. The top ranked strategies are presented, and moreover, the impact of features on their success are analysed using machine learning techniques. The analysis determines that the cooperation ratio of a strategy in a given tournament compared to the mean and median cooperator is the most important feature. The conclusions are distinct for different types of tournaments. For instance a strategy with a theory of mind would aim to be the mean/median cooperator in standard tournaments, whereas in tournaments with probabilistic ending it would aim to cooperate 10% of the times the median cooperator did.

更新日期：2020-01-17
• arXiv.cs.ET Pub Date : 2020-01-16
Yeheng Bo; Shuai Li; Peng Zhang; Juan Song; Xinjun Liu

The memristor neurons built with two memristors can be used to mimics many dynamical behaviours of a biological neuron. Firstly, the dynamic operating conditions of memristor neurons and their transformation boundaries between the spiking and the bursting are comprehensively investigated. Then, the underlying mechanism of bursting is analysed and the controllability of the number of spikes in each burst period is demonstrated under proper input voltage and input resistor. Final, numbers of spikes per period is recognized as neuron information carries and shown to enable pattern recognition and information transmitting. These results show a promising approach for the efficient use of neuristor in the construction of neural networks.

更新日期：2020-01-17
• arXiv.cs.CE Pub Date : 2020-01-16
Karsten Paul; Christopher Zimmermann; Thang X. Duong; Roger A. Sauer

This work presents numerical techniques to enforce continuity constraints on multi-patch surfaces for three distinct problem classes. The first involves structural analysis of thin shells that are described by general Kirchhoff-Love kinematics. Their governing equation is a vector-valued, fourth-order, nonlinear, partial differential equation (PDE) that requires at least $C^1$-continuity within a displacement-based finite element formulation. The second class are surface phase separations modeled by a phase field. Their governing equation is the Cahn-Hilliard equation - a scalar, fourth-order, nonlinear PDE - that can be coupled to the thin shell PDE. The third class are brittle fracture processes modeled by a phase field approach. In this work, these are described by a scalar, fourth-order, nonlinear PDE that is similar to the Cahn-Hilliard equation and is also coupled to the thin shell PDE. Using a direct finite element discretization, the two phase field equations also require at least a $C^1$-continuous formulation. Isogeometric surface discretizations - often composed of multiple patches - thus require constraints that enforce the $C^1$-continuity of displacement and phase field. For this, two numerical strategies are presented: A Lagrange multiplier formulation and a penalty regularization. They are both implemented within the curvilinear shell and phase field formulations of Duong et al. (2017), Zimmermann et al. (2019) and Paul et al. (2019) and illustrated by several numerical examples. These consider deforming shells, phase separations on evolving surfaces, and dynamic brittle fracture.

更新日期：2020-01-17
• arXiv.cs.CE Pub Date : 2017-12-19
Dimitrios Loukrezis; Ulrich Römer; Herbert De Gersem

We consider the problem of quantifying uncertainty regarding the output of an electromagnetic field problem in the presence of a large number of uncertain input parameters. In order to reduce the growth in complexity with the number of dimensions, we employ a dimension-adaptive stochastic collocation method based on nested univariate nodes. We examine the accuracy and performance of collocation schemes based on Clenshaw-Curtis and Leja rules, for the cases of uniform and bounded, non-uniform random inputs, respectively. Based on numerical experiments with an academic electromagnetic field model, we compare the two rules in both the univariate and multivariate case and for both quadrature and interpolation purposes. Results for a real-world electromagnetic field application featuring high-dimensional input uncertainty are also presented.

更新日期：2020-01-17
• arXiv.cs.CC Pub Date : 2020-01-15
João F. Doriguello; Ashley Montanaro

In this work we revisit the Boolean Hidden Matching communication problem, which was the first communication problem in the one-way model to demonstrate an exponential classical-quantum communication separation. In this problem, Alice's bits are matched into pairs according to a partition that Bob holds. These pairs are compressed using a Parity function and it is promised that the final bit-string is equal either to another bit-string Bob holds, or its complement. The problem is to decide which case is the correct one. Here we generalize the Boolean Hidden Matching problem by replacing the parity function with an arbitrary function $f$. Efficient communication protocols are presented depending on the sign-degree of $f$. If its sign-degree is less than or equal to 1, we show an efficient classical protocol. If its sign-degree is less than or equal to $2$, we show an efficient quantum protocol. We then completely characterize the classical hardness of all symmetric functions $f$ of sign-degree greater than or equal to $2$, except for one family of specific cases. We also prove, via Fourier analysis, a classical lower bound for any function $f$ whose pure high degree is greater than or equal to $2$. Similarly, we prove, also via Fourier analysis, a quantum lower bound for any function $f$ whose pure high degree is greater than or equal to $3$. These results give a large family of new exponential classical-quantum communication separations.

更新日期：2020-01-17
• arXiv.cs.CC Pub Date : 2020-01-15
Lars Jaffke; Mateus de Oliveira Oliveira; Hans Raj Tiwary

It can be shown that each permutation group $G \sqsubseteq S_n$ can be embedded, in a well defined sense, in a connected graph with $O(n+|G|)$ vertices. Some groups, however, require much fewer vertices. For instance, $S_n$ itself can be embedded in the $n$-clique $K_n$, a connected graph with n vertices. In this work, we show that the minimum size of a context-free grammar generating a finite permutation group $G \sqsubseteq S_n$ can be upper bounded by three structural parameters of connected graphs embedding $G$: the number of vertices, the treewidth, and the maximum degree. More precisely, we show that any permutation group $G \sqsubseteq S_n$ that can be embedded into a connected graph with $m$ vertices, treewidth k, and maximum degree $\Delta$, can also be generated by a context-free grammar of size $2^{O(k\Delta\log\Delta)}\cdot m^{O(k)}$. By combining our upper bound with a connection between the extension complexity of a permutation group and the grammar complexity of a formal language, we also get that these permutation groups can be represented by polytopes of extension complexity $2^{O(k \Delta\log \Delta)}\cdot m^{O(k)}$. The above upper bounds can be used to provide trade-offs between the index of permutation groups, and the number of vertices, treewidth and maximum degree of connected graphs embedding these groups. In particular, by combining our main result with a celebrated $2^{\Omega(n)}$ lower bound on the grammar complexity of the symmetric group $S_n$ we have that connected graphs of treewidth $o(n/\log n)$ and maximum degree $o(n/\log n)$ embedding subgroups of $S_n$ of index $2^{cn}$ for some small constant $c$ must have $n^{\omega(1)}$ vertices. This lower bound can be improved to exponential on graphs of treewidth $n^{\varepsilon}$ for $\varepsilon<1$ and maximum degree $o(n/\log n)$.

更新日期：2020-01-17
• arXiv.cs.CC Pub Date : 2020-01-16
Marc Hellmuth; Carsten R. Seemann; Peter F. Stadler

Binary relations derived from labeled rooted trees play an import role in mathematical biology as formal models of evolutionary relationships. The (symmetrized) Fitch relation formalizes xenology as the pairs of genes separated by at least one horizontal transfer event. As a natural generalization, we consider symmetrized Fitch maps, that is, symmetric maps $\varepsilon$ that assign a subset of colors to each pair of vertices in $X$ and that can be explained by a tree $T$ with edges that are labeled with subsets of colors in the sense that the color $m$ appears in $\varepsilon(x,y)$ if and only if $m$ appears in a label along the unique path between $x$ and $y$ in $T$. We first give an alternative characterization of the monochromatic case and then give a characterization of symmetrized Fitch maps in terms of compatibility of a certain set of quartets. We show that recognition of symmetrized Fitch maps is NP-complete but FPT in general. In the restricted case where $|\varepsilon(x,y)|\leq 1$ the problem becomes polynomial, since such maps coincide with class of monochromatic Fitch maps whose graph-representations form precisely the class of complete multi-partite graphs.

更新日期：2020-01-17
• arXiv.cs.CC Pub Date : 2018-12-19
Matthew Stephenson; Jochen Renz; Xiaoyu Ge

The physics-based simulation game Angry Birds has been heavily researched by the AI community over the past five years, and has been the subject of a popular AI competition that is currently held annually as part of a leading AI conference. Developing intelligent agents that can play this game effectively has been an incredibly complex and challenging problem for traditional AI techniques to solve, even though the game is simple enough that any human player could learn and master it within a short time. In this paper we analyse how hard the problem really is, presenting several proofs for the computational complexity of Angry Birds. By using a combination of several gadgets within this game's environment, we are able to demonstrate that the decision problem of solving general levels for different versions of Angry Birds is either NP-hard, PSPACE-hard, PSPACE-complete or EXPTIME-hard. Proof of NP-hardness is by reduction from 3-SAT, whilst proof of PSPACE-hardness is by reduction from True Quantified Boolean Formula (TQBF). Proof of EXPTIME-hardness is by reduction from G2, a known EXPTIME-complete problem similar to that used for many previous games such as Chess, Go and Checkers. To the best of our knowledge, this is the first time that a single-player game has been proven EXPTIME-hard. This is achieved by using stochastic game engine dynamics to effectively model the real world, or in our case the physics simulator, as the opponent against which we are playing. These proofs can also be extended to other physics-based games with similar mechanics.

更新日期：2020-01-17
• arXiv.cs.CC Pub Date : 2019-04-29
Thomas Bläsius; Philipp Fischbeck; Tobias Friedrich; Maximilian Katzmann

The VertexCover problem is proven to be computationally hard in different ways: It is NP-complete to find an optimal solution and even NP-hard to find an approximation with reasonable factors. In contrast, recent experiments suggest that on many real-world networks the run time to solve VertexCover is way smaller than even the best known FPT-approaches can explain. Similarly, greedy algorithms deliver very good approximations to the optimal solution in practice. We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover problem on hyperbolic random graphs can be solved in polynomial time, with high probability. The proof relies on interesting structural properties of hyperbolic random graphs. Since these predictions of the model are interesting in their own right, we conducted experiments on real-world networks showing that these properties are also observed in practice. When utilizing the same structural properties in an adaptive greedy algorithm, further experiments suggest that, on real instances, this leads to better approximations than the standard greedy approach within reasonable time.

更新日期：2020-01-17
• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16
Ray-I Chang, Yu-Hsuan Chiu, Jeng-Wei Lin

Tuberculosis (TB) has been one of top 10 leading causes of death. A computer-aided diagnosis system to accelerate TB diagnosis is crucial. In this paper, we apply convolutional neural network and deep learning to classify the images of TB culture test—the gold standard of TB diagnostic test. Since the dataset is small and imbalanced, a transfer learning approach is applied. Moreover, as the recall of non-negative class is an important metric for this application, we propose a two-stage classification method to boost the results. The experiment results on a real dataset of TB culture test (1727 samples with 16,503 images from Tao-Yuan General Hospital, Taiwan) show that the proposed method can achieve 99% precision and 98% recall on the non-negative class.

更新日期：2020-01-16
• J. Supercomput. (IF 2.157) Pub Date : 2020-01-16
Heithem Abbes, Thouraya Louati, Christophe Cérin

Abstract Infrastructure-as-a-service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of cloud architectures, faults are becoming a frequent occurrence, which makes availability true challenge. Replication is a method to survive failures whether of checkpoints, containers or data to increase their availability. In fact, following a node failure, fault-tolerant cloud systems restart failed containers on a new node from distributed images of containers (or checkpoints). With a high failure rate, we can lose some replicas. It is interesting to increase the replication factor in some cases and finding the trade-off between restarting all failed containers and storage overhead. This paper addresses the issue of adapting the replication factor and contributes with a novel replication factor modeling approach, which is able to predict the right replication factor using prediction techniques. These techniques are based on experimental modeling, which analyze collected data related to different executions. We have used regression technique to find the relation between availability and replicas number. Experiments on the Grid’5000 testbed demonstrate the benefits of our proposal to satisfy the availability requirement, using a real fault-tolerant cloud system.

更新日期：2020-01-16
Contents have been reproduced by permission of the publishers.

down
wechat
bug