• arXiv.cs.NE Pub Date : 2020-01-17
Iztok Fister Jr.; Suash Deb; Iztok Fister

Nowadays, the majority of data on the Internet is held in an unstructured format, like websites and e-mails. The importance of analyzing these data has been growing day by day. Similar to data mining on structured data, text mining methods for handling unstructured data have also received increasing attention from the research community. The paper deals with the problem of Association Rule Text Mining. To solve the problem, the PSO-ARTM method was proposed, that consists of three steps: Text preprocessing, Association Rule Text Mining using population-based metaheuristics, and text postprocessing. The method was applied to a transaction database obtained from professional triathlon athletes' blogs and news posted on their websites. The obtained results reveal that the proposed method is suitable for Association Rule Text Mining and, therefore, offers a promising way for further development.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2020-01-18
Zhengping Liang; Jian Zhang; Liang Feng; Zexuan Zhu

The placement scheme of virtual machines (VMs) to physical servers (PSs) is crucial to lowering operational cost for cloud providers. Evolutionary algorithms (EAs) have been performed promising-solving on virtual machine placement (VMP) problems in the past. However, as growing demand for cloud services, the existing EAs fail to implement in large-scale virtual machine placement (LVMP) problem due to the high time complexity and poor scalability. Recently, the multi-factorial optimization (MFO) technology has surfaced as a new search paradigm in evolutionary computing. It offers the ability to evolve multiple optimization tasks simultaneously during the evolutionary process. This paper aims to apply the MFO technology to the LVMP problem in heterogeneous environment. Firstly, we formulate a deployment cost based VMP problem in the form of the MFO problem. Then, a multi-factorial evolutionary algorithm (MFEA) embedded with greedy-based allocation operator is developed to address the established MFO problem. After that, a re-migration and merge operator is designed to offer the integrated solution of the LVMP problem from the solutions of MFO problem. To assess the effectiveness of our proposed method, the simulation experiments are carried on large-scale and extra large-scale VMs test data sets. The results show that compared with various heuristic methods, our method could shorten optimization time significantly and offer a competitive placement solution for the LVMP problem in heterogeneous environment.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2020-01-20
Roman Vershynin

Overwhelming theoretical and empirical evidence shows that mildly overparametrized neural networks -- those with more connections than the size of the training data -- are often able to memorize the training data with $100\%$ accuracy. This was rigorously proved for networks with sigmoid activation functions and, very recently, for ReLU activations. Addressing a 1988 open question of Baum, we prove that this phenomenon holds for general multilayered perceptrons, i.e. neural networks with threshold activation functions, or with any mix of threshold and ReLU activations. Our construction is probabilistic and exploits sparsity.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2020-01-20
Yuri Lavinas; Claus Aranha; Marcelo Ladeira; Felipe Campelo

Recent studies on resource allocation suggest that some subproblems are more important than others in the context of the MOEA/D, and that focusing on the most relevant ones can consistently improve the performance of that algorithm. These studies share the common characteristic of updating only a fraction of the population at any given iteration of the algorithm. In this work we investigate a new, simpler partial update strategy, in which a random subset of solutions is selected at every iteration. The performance of the MOEA/D using this new resource allocation approach is compared experimentally against that of the standard MOEA/D-DE and the MOEA/D with relative improvement-based resource allocation. The results indicate that using the MOEA/D with this new partial update strategy results in improved HV and IGD values, and a much higher proportion of non-dominated solutions, particularly as the number of updated solutions at every iteration is reduced.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2020-01-20
Renoh Johnson Chalakkal; Faizal Hafiz; Waleed Abdulla; Akshya Swain

The present study proposes a new approach to automated screening of Clinically Significant Macular Edema (CSME) and addresses two major challenges associated with such screenings, i.e., exudate segmentation and imbalanced datasets. The proposed approach replaces the conventional exudate segmentation based feature extraction by combining a pre-trained deep neural network with meta-heuristic feature selection. A feature space over-sampling technique is being used to overcome the effects of skewed datasets and the screening is accomplished by a k-NN based classifier. The role of each data-processing step (e.g., class balancing, feature selection) and the effects of limiting the region-of-interest to fovea on the classification performance are critically analyzed. Finally, the selection and implication of operating point on Receiver Operating Characteristic curve are discussed. The results of this study convincingly demonstrate that by following these fundamental practices of machine learning, a basic k-NN based classifier could effectively accomplish the CSME screening.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2020-01-21
Hao Xu; Haibin Chang; Dongxiao Zhang

Data-driven methods have recently been developed to discover underlying partial differential equations (PDEs) of physical problems. However, for these methods, a complete candidate library of potential terms in a PDE are usually required. To overcome this limitation, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In the proposed framework, a deep neural network that is trained with available data of a physical problem is utilized to generate meta-data and calculate derivatives, and the genetic algorithm is then employed to discover the underlying PDE. Owing to the merits of the genetic algorithm, such as mutation and crossover, DLGA-PDE can work with an incomplete candidate library. The proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV) equation, the Burgers equation, the wave equation, and the Chaffee-Infante equation, respectively, for proof-of-concept. Satisfactory results are obtained without the need for a complete candidate library, even in the presence of noisy and limited data.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2018-05-21
Liang Luo; Jacob Nelson; Luis Ceze; Amar Phanishayee; Arvind Krishnamurthy

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter servers (PSs) with optimized network stacks and gradient processing pipelines, as well as server and network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art distributed training techniques for cloud-based ImageNet workloads, with 25% better throughput per dollar.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2018-12-28
Denis Antipov; Benjamin Doerr

Despite significant progress in the theory of evolutionary algorithms, the theoretical understanding of evolutionary algorithms which use non-trivial populations remains challenging and only few rigorous results exist. Already for the most basic problem, the determination of the asymptotic runtime of the $(\mu+\lambda)$ evolutionary algorithm on the simple OneMax benchmark function, only the special cases $\mu=1$ and $\lambda=1$ have been solved. In this work, we analyze this long-standing problem and show the asymptotically tight result that the runtime $T$, the number of iterations until the optimum is found, satisfies $E[T] = \Theta\bigg(\frac{n\log n}{\lambda}+\frac{n}{\lambda / \mu} + \frac{n\log^+\log^+ \lambda/ \mu}{\log^+ \lambda / \mu}\bigg),$ where $\log^+ x := \max\{1, \log x\}$ for all $x > 0$. The same methods allow to improve the previous-best $O(\frac{n \log n}{\lambda} + n \log \lambda)$ runtime guarantee for the $(\lambda+\lambda)$~EA with fair parent selection to a tight $\Theta(\frac{n \log n}{\lambda} + n)$ runtime result.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2019-10-09
Haakon Robinson; Adil Rasheed; Omer San

In exchange for large quantities of data and processing power, deep neural networks have yielded models that provide state of the art predication capabilities in many fields. However, a lack of strong guarantees on their behaviour have raised concerns over their use in safety-critical applications. A first step to understanding these networks is to develop alternate representations that allow for further analysis. It has been shown that neural networks with piecewise affine activation functions are themselves piecewise affine, with their domains consisting of a vast number of linear regions. So far, the research on this topic has focused on counting the number of linear regions, rather than obtaining explicit piecewise affine representations. This work presents a novel algorithm that can compute the piecewise affine form of any fully connected neural network with rectified linear unit activations.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2019-10-24
Dongxu Li; Cristian Rodriguez Opazo; Xin Yu; Hongdong Li

Vision-based sign language recognition aims at helping deaf people to communicate with others. However, most existing sign language datasets are limited to a small number of words. Due to the limited vocabulary size, models learned from those datasets cannot be applied in practice. In this paper, we introduce a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers. This dataset will be made publicly available to the research community. To our knowledge, it is by far the largest public ASL dataset to facilitate word-level sign recognition research. Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios. Specifically we implement and compare two different models,i.e., (i) holistic visual appearance-based approach, and (ii) 2D human pose based approach. Both models are valuable baselines that will benefit the community for method benchmarking. Moreover, we also propose a novel pose-based temporal graph convolution networks (Pose-TGCN) that models spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose-based method. Our results show that pose-based and appearance-based models achieve comparable performances up to 66% at top-10 accuracy on 2,000 words/glosses, demonstrating the validity and challenges of our dataset. Our dataset and baseline deep models are available at \url{https://dxli94.github.io/WLASL/}.

更新日期：2020-01-22
• arXiv.cs.NE Pub Date : 2020-01-16
T. Serizawa; H. Fujita

Convolutional neural network (CNN) is one of the most frequently used deep learning techniques. Various forms of models have been proposed and improved for learning at CNN. When learning with CNN, it is necessary to determine the optimal hyperparameters. However, the number of hyperparameters is so large that it is difficult to do it manually, so much research has been done on automation. A method that uses metaheuristic algorithms is attracting attention in research on hyperparameter optimization. Metaheuristic algorithms are naturally inspired and include evolution strategies, genetic algorithms, antcolony optimization and particle swarm optimization. In particular, particle swarm optimization converges faster than genetic algorithms, and various models have been proposed. In this paper, we propose CNN hyperparameter optimization with linearly decreasing weight particle swarm optimization (LDWPSO). In the experiment, the MNIST data set and CIFAR-10 data set, which are often used as benchmark data sets, are used. By optimizing CNN hyperparameters with LDWPSO, learning the MNIST and CIFAR-10 datasets, we compare the accuracy with a standard CNN based on LeNet-5. As a result, when using the MNIST dataset, the baseline CNN is 94.02% at the 5th epoch, compared to 98.95% for LDWPSO CNN, which improves accuracy. When using the CIFAR-10 dataset, the Baseline CNN is 28.07% at the 10th epoch, compared to 69.37% for the LDWPSO CNN, which greatly improves accuracy.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2019-12-30
Takahiro Suzuki; Shingo Takeshita; Satoshi Ono

This paper proposes Evolutionary Multi-objective Optimization (EMO)-based Adversarial Example (AE) design method that performs under black-box setting. Previous gradient-based methods produce AEs by changing all pixels of a target image, while previous EC-based method changes small number of pixels to produce AEs. Thanks to EMO's property of population based-search, the proposed method produces various types of AEs involving ones locating between AEs generated by the previous two approaches, which helps to know the characteristics of a target model or to know unknown attack patterns. Experimental results showed the potential of the proposed method, e.g., it can generate robust AEs and, with the aid of DCT-based perturbation pattern generation, AEs for high resolution images.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2020-01-13
Nataliya Le Vine; Claus Horn; Matthew Zeigenfuse; Mark Rowan

In many industries, as well as in academic research, information is primarily transmitted in the form of unstructured documents (this article, for example). Hierarchically-related data is rendered as tables, and extracting information from tables in such documents presents a significant challenge. Many existing methods take a bottom-up approach, first integrating lines into cells, then cells into rows or columns, and finally inferring a structure from the resulting 2-D layout. But such approaches neglect the available prior information relating to table structure, namely that the table is merely an arbitrary representation of a latent logical structure. We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form denoting approximate row and column borders without table content, then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation. The approach is easily adaptable to different table configurations and requires small data set sizes for training.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2020-01-16
Wei Hu; Lechao Xiao; Jeffrey Pennington

The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance. Yet despite significant empirical and theoretical analysis, relatively little has been proved about the concrete effects of different initialization schemes. In this work, we analyze the effect of initialization in deep linear networks, and provide for the first time a rigorous proof that drawing the initial weights from the orthogonal group speeds up convergence relative to the standard Gaussian initialization with iid weights. We show that for deep networks, the width needed for efficient convergence to a global minimum with orthogonal initializations is independent of the depth, whereas the width needed for efficient convergence with Gaussian initializations scales linearly in the depth. Our results demonstrate how the benefits of a good initialization can persist throughout learning, suggesting an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2019-01-02
Kenji Kawaguchi; Leslie Pack Kaelbling

In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions. At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Moreover, we provide a novel theoretical characterization of a failure mode of eliminating suboptimal local minima via an additional theorem and several examples. This paper also introduces a novel proof technique based on the perturbable gradient basis (PGB) necessary condition of local minima, which provides new insight into the elimination of local minima and is applicable to analyze various models and transformations of objective functions beyond the elimination of local minima.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2019-05-30
Mostafa Elhoushi; Zihao Chen; Farhan Shafiq; Ye Henry Tian; Joey Yiwei Li

Deployment of convolutional neural networks (CNNs) in mobile environments, their high computation and power budgets proves to be a major bottleneck. Convolution layers and fully connected layers, because of their intense use of multiplications, are the dominant contributer to this computation budget. This paper proposes to tackle this problem by introducing two new operations: convolutional shifts and fully-connected shifts, that replace multiplications all together with bitwise shift and sign flipping instead. For inference, both approaches may require only 6 bits to represent the weights. This family of neural network architectures (that use convolutional shifts and fully-connected shifts) are referred to as DeepShift models. We propose two methods to train DeepShift models: DeepShift-Q that trains regular weights constrained to powers of 2, and DeepShift-PS that trains the values of the shifts and sign flips directly. Training the DeepShift versions of ResNet18 architecture from scratch, we obtained accuracies of 92.33% on CIFAR10 dataset, and Top-1/Top-5 accuracies of 65.63%/86.33% on Imagenet dataset. Training the DeepShift version of VGG16 on ImageNet from scratch, resulted in a drop of less than 0.3% in Top-5 accuracy. Converting the pre-trained 32-bit floating point baseline model of GoogleNet to DeepShift and training it for 3 epochs, resulted in a Top-1/Top-5 accuracies of 69.87%/89.62% that are actually higher than that of the original model. Further testing is made on various well-known CNN architectures. Last but not least, we implemented the convolutional shifts and fully-connected shift GPU kernels and showed a reduction in latency time of 25\% when inferring ResNet18 compared to an unoptimized multiplication-based GPU kernels. The code is available online at https://github.com/mostafaelhoushi/DeepShift.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2019-08-23
Alexander Ororbia

For energy-efficient computation in specialized neuromorphic hardware, we present the Spiking Neural Coding Network, an instantiation of a family of artificial neural models strongly motivated by the theory of predictive coding. The model, in essence, works by operating in a never-ending process of "guess-and-check", where neurons predict the activity values of one another and then immediately adjust their own activities to make better future predictions. The interactive, iterative nature of our neural system fits well into the continuous time formulation of data sensory stream prediction and, as we show, the model's structure yields a simple, local synaptic update rule, which could be used to complement or replace online spike-timing dependent plasticity. In this article, we experiment with an instantiation of our model that consists of leaky integrate-and-fire units. However, the general framework within which our model is situated can naturally incorporate more complex, formal neurons such as the Hodgkin-Huxley model. Our experimental results in pattern recognition demonstrate the potential of the proposed model when binary spike trains are the primary paradigm for inter-neuron communication. Notably, our model is competitive in terms of classification performance, can conduct online semi-supervised learning, naturally experiences less forgetting when learning from a sequence of tasks, and is more computationally economical and biologically-plausible than popular artificial neural networks.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2019-09-25
AkshatKumar Nigam; Pascal Friederich; Mario Krenn; Alán Aspuru-Guzik

Challenges in natural sciences can often be phrased as optimization problems. Machine learning techniques have recently been applied to solve such problems. One example in chemistry is the design of tailor-made organic materials and molecules, which requires efficient methods to explore the chemical space. We present a genetic algorithm (GA) that is enhanced with a neural network (DNN) based discriminator model to improve the diversity of generated molecules and at the same time steer the GA. We show that our algorithm outperforms other generative models in optimization tasks. We furthermore present a way to increase interpretability of genetic algorithms, which helped us to derive design principles.

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2019-10-15
W. Ronny Huang; Yike Qi; Qianqian Li; Jonathan Degange

Paper-intensive industries like insurance, law, and government have long leveraged optical character recognition (OCR) to automatically transcribe hordes of scanned documents into text strings for downstream processing. Even in 2019, there are still many scanned documents and mail that come into businesses in non-digital format. Text to be extracted from real world documents is often nestled inside rich formatting, such as tabular structures or forms with fill-in-the-blank boxes or underlines whose ink often touches or even strikes through the ink of the text itself. Further, the text region could have random ink smudges or spurious strokes. Such ink artifacts can severely interfere with the performance of recognition algorithms or other downstream processing tasks. In this work, we propose DeepErase, a neural-based preprocessor to erase ink artifacts from text images. We devise a method to programmatically assemble real text images and real artifacts into realistic-looking "dirty" text images, and use them to train an artifact segmentation network in a weakly supervised manner, since pixel-level annotations are automatically obtained during the assembly process. In addition to high segmentation accuracy, we show that our cleansed images achieve a significant boost in recognition accuracy by popular OCR software such as Tesseract 4.0. Finally, we test DeepErase on out-of-distribution datasets (NIST SDB) of scanned IRS tax return forms and achieve double-digit improvements in accuracy. All experiments are performed on both printed and handwritten text. Code for all experiments is available at https://github.com/yikeqicn/DeepErase

更新日期：2020-01-17
• arXiv.cs.NE Pub Date : 2020-01-14

Neural networks can approximate complex functions, but they struggle to perform exact arithmetic operations over real numbers. The lack of inductive bias for arithmetic operations leaves neural networks without the underlying logic necessary to extrapolate on tasks such as addition, subtraction, and multiplication. We present two new neural network components: the Neural Addition Unit (NAU), which can learn exact addition and subtraction; and the Neural Multiplication Unit (NMU) that can multiply subsets of a vector. The NMU is, to our knowledge, the first arithmetic neural network component that can learn to multiply elements from a vector, when the hidden size is large. The two new components draw inspiration from a theoretical analysis of recently proposed arithmetic components. We find that careful initialization, restricting parameter space, and regularizing for sparsity is important when optimizing the NAU and NMU. Our proposed units NAU and NMU, compared with previous neural units, converge more consistently, have fewer parameters, learn faster, can converge for larger hidden sizes, obtain sparse and meaningful weights, and can extrapolate to negative and small values.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-14
Jake Gutierrez; Jacob Schrum

Generative Adversarial Networks (GANs) have demonstrated their ability to learn patterns in data and produce new exemplars similar to, but different from, their training set in several domains, including video games. However, GANs have a fixed output size, so creating levels of arbitrary size for a dungeon crawling game is difficult. GANs also have trouble encoding semantic requirements that make levels interesting and playable. This paper combines a GAN approach to generating individual rooms with a graph grammar approach to combining rooms into a dungeon. The GAN captures design principles of individual rooms, but the graph grammar organizes rooms into a global layout with a sequence of obstacles determined by a designer. Room data from The Legend of Zelda is used to train the GAN. This approach is validated by a user study, showing that GAN dungeons are as enjoyable to play as a level from the original game, and levels generated with a graph grammar alone. However, GAN dungeons have rooms considered more complex, and plain graph grammar's dungeons are considered least complex and challenging. Only the GAN approach creates an extensive supply of both layouts and rooms, where rooms span across the spectrum of those seen in the training set to new creations merging design principles from multiple rooms.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-15
Frank Neumann; Andrew M. Sutton

This chapter compiles a number of results that apply the theory of parameterized algorithmics to the running-time analysis of randomized search heuristics such as evolutionary algorithms. The parameterized approach articulates the running time of algorithms solving combinatorial problems in finer detail than traditional approaches from classical complexity theory. We outline the main results and proof techniques for a collection of randomized search heuristics tasked to solve NP-hard combinatorial optimization problems such as finding a minimum vertex cover in a graph, finding a maximum leaf spanning tree in a graph, and the traveling salesperson problem.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-15
Jiawei Zhang; Haopeng Zhang; Li Sun; Congying Xia

The dominant graph neural networks (GNNs) over-rely on the graph links, several serious performance problems with which have been witnessed already, e.g., suspended animation problem and over-smoothing problem. What's more, the inherently inter-connected nature precludes parallelization within the graph, which becomes critical for large-sized data input, as memory constraints limit batching across the nodes. In this paper, we will introduce a new graph neural network, namely GRAPH-BERT (Graph based BERT), solely based on the attention mechanism without any graph convolution or aggregation operators. Instead of fedd GRAPH-BERT with the complete large input graph, we propose to train GRAPH-BERT with sampled linkless subgraphs within the local context. In addition, the pre-trained GRAPH-BERT model can also be fine-tuned with additional output layers/functional components as the state-of-the-art if any supervised label information or certain application oriented objective is available. We have tested the effectiveness of GRAPH-BERT on several benchmark graph datasets. Based the pre-trained GRAPH-BERT with the node attribute reconstruction and structure recovery tasks, we further fine-tune GRAPH-BERT on node classification and graph clustering tasks specifically. The experimental results have demonstrated that GRAPH-BERT can out-perform the existing GNNs in both the learning effectiveness and efficiency.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-15

We consider the fundamental problem of learning a single neuron $x \mapsto\sigma(w^\top x)$ using standard gradient methods. As opposed to previous works, which considered specific (and not always realistic) input distributions and activation functions $\sigma(\cdot)$, we ask whether a more general result is attainable, under milder assumptions. On the one hand, we show that some assumptions on the distribution and the activation function are necessary. On the other hand, we prove positive guarantees under mild assumptions, which go beyond those studied in the literature so far. We also point out and study the challenges in further strengthening and generalizing our results.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-15
James D. Gadze; Kwame A. Agyekum; Stephen J. Nuagah; E. A. Affum

To maximize the benefits of LTE cellular networks, careful and proper planning is needed. This requires the use of accurate propagation models to quantify the path loss required for base station deployment. Deployed LTE networks in Ghana can barely meet the desired 100Mbps throughput leading to customer dissatisfaction. Network operators rely on transmission planning tools designed for generalized environments that come with already embedded propagation models suited to other environments. A challenge therefore to Ghanaian transmission Network planners will be choosing an accurate and precise propagation model that best suits the Ghanaian environment. Given this, extensive LTE path loss measurements at 800MHz and 2600MHz were taken in selected urban and suburban environments in Ghana and compared with 6 commonly used propagation models. Improved versions of the Ericson, SUI, and ECC-33 developed in this study predict more precisely the path loss in Ghanaian environments compared with commonly used propagation models.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-15
Dhruv Khandelwal; Maarten Schoukens; Roland Tóth

Model structure and complexity selection remains a challenging problem in system identification, especially for parametric non-linear models. Many Evolutionary Algorithm (EA) based methods have been proposed in the literature for estimating model structure and complexity. In most cases, the proposed methods are devised for estimating structure and complexity within a specified model class and hence these methods do not extend to other model structures without significant changes. In this paper, we propose a Tree Adjoining Grammar (TAG) for stochastic parametric models. TAGs can be used to generate models in an EA framework while imposing desirable structural constraints and incorporating prior knowledge. In this paper, we propose a TAG that can systematically generate models ranging from FIRs to polynomial NARMAX models. Furthermore, we demonstrate that TAGs can be easily extended to more general model classes, such as the non-linear Box-Jenkins model class, enabling the realization of flexible and automatic model structure and complexity selection via EA.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-08
Yusuke Sakemi; Kai Morino; Takashi Morie; Kazuyuki Aihara

Spiking neural networks (SNNs) are brain-inspired mathematical models with the ability to process information in the form of spikes. SNNs are expected to provide not only new machine-learning algorithms, but also energy-efficient computational models when implemented in VLSI circuits. In this paper, we propose a novel supervised learning algorithm for SNNs based on temporal coding. A spiking neuron in this algorithm is designed to facilitate analog VLSI implementations with analog resistive memory, by which ultra-high energy efficiency can be achieved. We also propose several techniques to improve the performance on a recognition task, and show that the classification accuracy of the proposed algorithm is as high as that of the state-of-the-art temporal coding SNN algorithms on the MNIST dataset. Finally, we discuss the robustness of the proposed SNNs against variations that arise from the device manufacturing process and are unavoidable in analog VLSI implementation. We also propose a technique to suppress the effects of variations in the manufacturing process on the recognition performance.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-15
Erdem Kose

Target motion analysis using only bearing angles is an important study for tracking targets in water. Several methods including Kalman-like filters and evolutionary strategies are used to get a good predictor. Kalman-like filters couldn't get the expected results thus evolutionary strategies have been using in this area for a long time. Target Motion Analysis with Genetic Algorithm is the most successful method for Bearings-Only Target Motion Analysis and we investigated it. We found that Covariance Matrix Adaptation Evolutionary Strategies does the similar work with Target Motion Analysis with Genetic Algorithm and tried it; but it has statistical feedback mechanism and converges faster than other methods. In this study, we compared and criticize the methods.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2019-11-06
Firat Tuna

Functions are rich in meaning and can be interpreted in a variety of ways. Neural networks were proven to be capable of approximating a large class of functions[1]. In this paper, we propose a new class of neural networks called "Neural Network Processing Neural Networks" (NNPNNs), which inputs neural networks and numerical values, instead of just numerical values. Thus enabling neural networks to represent and process rich structures.

更新日期：2020-01-16
• arXiv.cs.NE Pub Date : 2020-01-13
William B. Langdon

random_tree() is a linear time and space C++ implementation able to create trees of up to a billion nodes for genetic programming and genetic improvement experiments. A 3.60GHz CPU can generate more than 18 million random nodes for GP program trees per second.

更新日期：2020-01-15
• arXiv.cs.NE Pub Date : 2018-10-25
Bo-Jian Hou; Zhi-Hua Zhou

The interpretability of deep learning models has raised extended attention these years. It will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks~(RNNs) especially gated RNNs whose inner mechanism is still not clearly understood. We find that Finite State Automaton~(FSA) that processes sequential data has more interpretable inner mechanism according to the definition of interpretability and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. With the learned FSA and via experiments on artificial and real datasets, we find that FSA is more trustable than the RNN from which it learned, which gives FSA a chance to substitute RNNs in applications involving humans' lives or dangerous facilities. Besides, we analyze how the number of gates affects the performance of RNN. Our result suggests that gate in RNN is important but the less the better, which could be a guidance to design other RNNs. Finally, we observe that the FSA learned from RNN gives semantic aggregated states and its transition graph shows us a very interesting vision of how RNNs intrinsically handle text classification tasks.

更新日期：2020-01-15
• arXiv.cs.NE Pub Date : 2020-01-10
Claudio Lucio do Val Lopes; Flávio Vinícius Cruzeiro Martins; Elizabeth Fialho Wanner

Dominance move (DoM) is a binary quality indicator that can be used in multiobjective optimization. It can compare solution sets while representing some important features such as convergence, spread, uniformity, and cardinality. DoM has an intuitive concept and considers the minimum move of one set needed to weakly Pareto dominate the other set. Despite the aforementioned properties, DoM is hard to calculate. The original formulation presents an efficient and exact method to calculate it in a biobjective case only. This work presents a new approach to calculate and extend DoM to deal with three or more objectives. The idea is to use a mixed integer programming (MIP) approach to calculate DoM. Some initial experiments, in the biobjective space, were done to verify the model correctness. Furthermore, other experiments, using three, five, and ten objective functions were done to show how the model behaves in higher dimensional cases. Algorithms such as IBEA, MOEAD, NSGAIII, NSGAII, and SPEA2 were used to generate the solution sets, however any other algorithms could be used with DoM indicator. The results have confirmed the effectiveness of the MIP DoM in problems with more than three objective functions. Final notes, considerations, and future research are discussed to exploit some solution sets particularities and improve the model and its use for other situations.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2020-01-09
Francisco Huhn; Luca Magri

We propose a physics-informed machine learning method to predict the time average of a chaotic attractor. The method is based on the hybrid echo state network (hESN). We assume that the system is ergodic, so the time average is equal to the ergodic average. Compared to conventional echo state networks (ESN) (purely data-driven), the hESN uses additional information from an incomplete, or imperfect, physical model. We evaluate the performance of the hESN and compare it to that of an ESN. This approach is demonstrated on a chaotic time-delayed thermoacoustic system, where the inclusion of a physical model significantly improves the accuracy of the prediction, reducing the relative error from 48% to 7%. This improvement is obtained at the low extra cost of solving two ordinary differential equations. This framework shows the potential of using machine learning techniques combined with prior physical knowledge to improve the prediction of time-averaged quantities in chaotic systems.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2020-01-13
Xuan Liu; Renato Gasoto; Cagdas Onal; Jie Fu

In this paper, we present a new locomotion control method for soft robot snakes. Inspired by biological snakes, our control architecture is composed of two key modules: A deep reinforcement learning (RL) module for achieving adaptive goal-reaching behaviors with changing goals, and a central pattern generator (CPG) system with Matsuoka oscillators for generating stable and diverse behavior patterns. The two modules are interconnected into a closed-loop system: The RL module, acting as the "brain", regulates the input of the CPG system based on state feedback from the robot. The output of the CPG system is then translated into pressure inputs to pneumatic actuators of a soft snake robot. Since the oscillation frequency and wave amplitude of the Matsuoka oscillator can be independently controlled under different time scales, we adapt the option-critic framework to improve the learning performance measured by optimality and data efficiency. We verify the performance of the proposed control method in experiments with both simulated and real snake robots.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2020-01-09
Shahab Shamshirband; Meisam Babanezhad; Amir Mosavi; Narjes Nabipour; Eva Hajnal; Laszlo Nadai; Kwok-Wing Chau

In order to perceive the behavior presented by the multiphase chemical reactors, the ant colony optimization algorithm was combined with computational fluid dynamics (CFD) data. This intelligent algorithm creates a probabilistic technique for computing flow and it can predict various levels of three-dimensional bubble column reactor (BCR). This artificial ant algorithm is mimicking real ant behavior. This method can anticipate the flow characteristics in the reactor using almost 30 % of the whole data in the domain. Following discovering the suitable parameters, the method is used for predicting the points not being simulated with CFD, which represent mesh refinement of Ant colony method. In addition, it is possible to anticipate the bubble-column reactors in the absence of numerical results or training of exact values of evaluated data. The major benefits include reduced computational costs and time savings. The results show a great agreement between ant colony prediction and CFD outputs in different sections of the BCR. The combination of ant colony system and neural network framework can provide the smart structure to estimate biological and nature physics base phenomena. The ant colony optimization algorithm (ACO) framework based on ant behavior can solve all local mathematical answers throughout 3D bubble column reactor. The integration of all local answers can provide the overall solution in the reactor for different characteristics. This new overview of modelling can illustrate new sight into biological behavior in nature.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2020-01-13
Zeyuan Allen-Zhu; Yuanzhi Li

How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of $\textit{hierarchical learning}$. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically simply by applying stochastic gradient descent (SGD). On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how very deep neural networks can still be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding) are efficient. We establish a new principle called "backward feature correction", which we believe is the key to understand the hierarchical learning in multi-layer neural networks. On the technical side, we show for regression and even for binary classification, for every input dimension $d > 0$, there is a concept class consisting of degree $\omega(1)$ multi-variate polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any target function from this class in $\mathsf{poly}(d)$ time using $\mathsf{poly}(d)$ samples to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions. In contrast, we present lower bounds stating that several non-hierarchical learners, including any kernel methods, neural tangent kernels, must suffer from $d^{\omega(1)}$ sample or time complexity to learn functions in this concept class even to any $d^{-0.01}$ error.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2019-07-06
Weiwen Jiang; Lei Yang; Edwin Sha; Qingfeng Zhuge; Shouzhen Gu; Sakyasingha Dasgupta; Yiyu Shi; Jingtong Hu

We propose a novel hardware and software co-exploration framework for efficient neural architecture search (NAS). Different from existing hardware-aware NAS which assumes a fixed hardware design and explores the neural architecture search space only, our framework simultaneously explores both the architecture search space and the hardware design space to identify the best neural architecture and hardware pairs that maximize both test accuracy and hardware efficiency. Such a practice greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs. The framework iteratively performs a two-level (fast and slow) exploration. Without lengthy training, the fast exploration can effectively fine-tune hyperparameters and prune inferior architectures in terms of hardware specifications, which significantly accelerates the NAS process. Then, the slow exploration trains candidates on a validation set and updates a controller using the reinforcement learning to maximize the expected accuracy together with the hardware efficiency. Experiments on ImageNet show that our co-exploration NAS can find the neural architectures and associated hardware design with the same accuracy, 35.24% higher throughput, 54.05% higher energy efficiency and 136x reduced search time, compared with the state-of-the-art hardware-aware NAS.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2019-09-24
Shibo Zhou; Ying Chen; Qiang Ye; Jingxi Li

Direct training based spiking neural networks (SNNs) have been paid a lot of attention recently because of its high energy efficiency on emerging neuromorphic hardware. However, due to the non-differentiability of the spiking activity, most of the related SNNs still cannot achieve high object recognition accuracy for the complicated dataset, such as CIFAR-10. Even though some of them can reach the accuracy of 90%, the energy consumption in those networks is very high. Considering this, we propose a direct supervised learning based spiking convolutional neural networks (SCNNs) using temporal coding scheme in this study, aiming to exploit minimum trainable parameters to recognize the object in the image with high accuracy. The MNIST and CIFAR-10 datasets are used to evaluate the performance of the proposed networks. For the MNIST dataset, the proposed networks with noise input are able to reach the high recognition accuracy (99.13%) as the other state-of-art models but use the much less trainable parameters than them. For CIFAR-10 dataset, the proposed networks with data augmentation step can reach the recognition accuracy of 80.49%., which is the state-of-art high accuracy in the field of direct training based SNNs using temporal coding manner. In addition, the number of trainable parameters used in such networks is much less than that in the conversion based SCNNs reported in the literature.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2019-09-25
Xingyou Song; Wenbo Gao; Yuxiang Yang; Krzysztof Choromanski; Aldo Pacchiano; Yunhao Tang

We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES). Existing algorithms for MAML are based on policy gradients, and incur significant difficulties when attempting to estimate second derivatives using backpropagation on stochastic policies. We show how ES can be applied to MAML to obtain an algorithm which avoids the problem of estimating second derivatives, and is also conceptually simple and easy to implement. Moreover, ES-MAML can handle new types of nonsmooth adaptation operators, and other techniques for improving performance and estimation of ES methods become applicable. We show empirically that ES-MAML is competitive with existing methods and often yields better adaptation with fewer queries.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2019-11-09

Biological neurons use spikes to process and learn temporally dynamic inputs in an energy and computationally efficient way. However, applying the state-of-the-art gradient-based supervised algorithms to spiking neural networks (SNN) is a challenge due to the non-differentiability of the activation function of spiking neurons. Employing surrogate gradients is one of the main solutions to overcome this challenge. Although SNNs naturally work in the temporal domain, recent studies have focused on developing SNNs to solve static image categorization tasks. In this paper, we employ a surrogate gradient descent learning algorithm to recognize twelve human hand gestures recorded by dynamic vision sensor (DVS) cameras. The proposed SNN could reach 97.2% recognition accuracy on test data.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2019-12-09
Johannes Müller; Marius Zeinhofer

Recently, progress has been made in the application of neural networks to the numerical analysis of partial differential equations (PDEs). In the latter the variational formulation of the Poisson problem is used in order to obtain an objective function - a regularised Dirichlet energy - that was used for the optimisation of some neural networks. In this notes we use the notion of $\Gamma$-convergence to show that ReLU networks of growing architecture that are trained with respect to suitably regularised Dirichlet energies converge to the true solution of the Poisson problem. We discuss how this approach generalises to arbitrary variational problems under certain universality assumptions of neural networks and see that this covers some nonlinear stationary PDEs like the $p$-Laplace.

更新日期：2020-01-14
• arXiv.cs.NE Pub Date : 2019-12-29
Mahesh B. Patil; Ramakrishna U.; Mohan S. C

Multi-objective optimisation of damper placement in dynamically symmetric adjacent buildings is considered with identical viscoelastic dampers used for vibration control. First, exhaustive search is used to describe the solution space in terms of various quantities of interest such as maximum top floor displacement, maximum floor acceleration, base shear, and interstorey drift. With the help of examples, it is pointed out that the Pareto fronts in these problems contain a very small number of solutions. The effectiveness of two commonly used multi-objective evolutionary algorithms, viz., NSGA-II and MOPSO, is evaluated for a specific example.

更新日期：2020-01-13
• arXiv.cs.NE Pub Date : 2020-01-09
Stefan Horoi; Guillaume Lajoie; Guy Wolf

The efficiency of recurrent neural networks (RNNs) in dealing with sequential data has long been established. However, unlike deep, and convolution networks where we can attribute the recognition of a certain feature to every layer, it is unclear what "sub-task" a single recurrent step or layer accomplishes. Our work seeks to shed light onto how a vanilla RNN implements a simple classification task by analysing the dynamics of the network and the geometric properties of its hidden states. We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer. Furthermore the network's dynamics and the sequence length are both critical to correct classifications even when there is no additional task relevant information provided.

更新日期：2020-01-13
• arXiv.cs.NE Pub Date : 2020-01-10
Mathilde Caron; Ari Morcos; Piotr Bojanowski; Julien Mairal; Armand Joulin

Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on a task while trying to maintain the performance of the pruned network on the same task. However, in self-supervised feature learning, the training objective is agnostic on the representation transferability to downstream tasks. Thus, preserving performance for this objective does not ensure that the pruned subnetwork remains effective for solving downstream tasks. In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i.e. on self-supervised tasks). We show that pruned masks obtained with or without labels reach comparable performance when re-trained on labels, suggesting that pruning operates similarly for self-supervised and supervised learning. Interestingly, we also find that pruning preserves the transfer performance of self-supervised subnetwork representations.

更新日期：2020-01-13
• arXiv.cs.NE Pub Date : 2019-07-04
Marco Virgolin; Tanja Alderliesten; Peter A. N. Bosman

Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.

更新日期：2020-01-13
• arXiv.cs.NE Pub Date : 2020-01-09

Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the quantization error. In this work, we focus on the binary quantization, in which values are mapped to -1 and 1. We introduce several novel quantization algorithms: optimal 1-bit, ternary, 2-bits, and greedy. Our quantization algorithms can be implemented efficiently on the hardware using bitwise operations. We present proofs to show that our proposed methods are optimal, and also provide empirical error analysis. We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed optimal quantization algorithms.

更新日期：2020-01-10
• arXiv.cs.NE Pub Date : 2020-01-09
Patrick Murer; Hans-Andrea Loeliger

This paper studies the capability of a recurrent neural network model to memorize random dynamical firing patterns by a simple local learning rule. Two modes of learning/memorization are considered: The first mode is strictly online, with a single pass through the data, while the second mode uses multiple passes through the data. In both modes, the learning is strictly local (quasi-Hebbian): At any given time step, only the weights between the neurons firing (or supposed to be firing) at the previous time step and those firing (or supposed to be firing) at the present time step are modified. The main result of the paper is an upper bound on the probability that the single-pass memorization is not perfect. It follows that the memorization capacity in this mode asymptotically scales like that of the classical Hopfield model (which, in contrast, memorizes static patterns). However, multiple-rounds memorization is shown to achieve a higher capacity (with a nonvanishing number of bits per connection/synapse). These mathematical findings may be helpful for understanding the functions of short-term memory and long-term memory in neuroscience.

更新日期：2020-01-10
• arXiv.cs.NE Pub Date : 2020-01-09
Frederik Rehbach; Martin Zaefferer; Boris Naujoks; Thomas Bartz-Beielstein

Surrogate-based optimization relies on so-called infill criteria (acquisition functions) to decide which point to evaluate next. When Kriging is used as the surrogate model of choice (also called Bayesian optimization), then one of the most frequently chosen criteria is expected improvement. Yet, we argue that the popularity of expected improvement largely relies on its theoretical properties rather than empirically validated performance. A few select results from the literature show evidence, that under certain conditions, expected improvement may perform worse than something as simple as the predicted value of a surrogate model. Both infill criteria are benchmarked in an extensive empirical study on the BBOB' function set. Importantly, this investigation includes a detailed study of the impact of problem dimensionality on algorithm performance. The results support the hypothesis that exploration loses importance with increasing problem dimensionality. A statistical analysis reveals that the purely exploitative search with the predicted value criterion performs better on most problems of five or more dimensions. Possible reasons for these results are discussed. We give an in-depth guide for choosing between both infill criteria based on a priori knowledge about a given problem, its dimensionality, and the available evaluation budget.

更新日期：2020-01-10
• arXiv.cs.NE Pub Date : 2020-01-09
Andrew Anderson; Jing Su; Rozenn Dahyot; David Gregg

Hardware-Software Co-Design is a highly successful strategy for improving performance of domain-specific computing systems. We argue for the application of the same methodology to deep learning; specifically, we propose to extend neural architecture search with information about the hardware to ensure that the model designs produced are highly efficient in addition to the typical criteria around accuracy. Using the task of keyword spotting in audio on edge computing devices, we demonstrate that our approach results in neural architecture that is not only highly accurate, but also efficiently mapped to the computing platform which will perform the inference. Using our modified neural architecture search, we demonstrate $0.88\%$ increase in TOP-1 accuracy with $1.85\times$ reduction in latency for keyword spotting in audio on an embedded SoC, and $1.59\times$ on a high-end GPU.

更新日期：2020-01-10
• arXiv.cs.NE Pub Date : 2019-10-08
Mahardhika Pratama; Choiru Za'in; Andri Ashfahani; Yew Soon Ong; Weiping Ding

更新日期：2020-01-10
• arXiv.cs.NE Pub Date : 2019-12-17
Johnny Moughames; Xavier Porte; Michael Thiel; Gwenn Ulliac; Maxime Jacquot; Laurent Larger; Muamer Kadic; Daniel Brunner

Photonic waveguides are prime candidates for integrated and parallel photonic interconnects. Such interconnects correspond to large-scale vector matrix products, which are at the heart of neural network computation. However, parallel interconnect circuits realized in two dimensions, for example by lithography, are strongly limited in size due to disadvantageous scaling. We use three dimensional (3D) printed photonic waveguides to overcome this limitation. 3D optical-couplers with fractal topology efficiently connect large numbers of input and output channels, and we show that the substrate's footprint area scales linearly. Going beyond simple couplers, we introduce functional circuits for discrete spatial filters identical to those used in deep convolutional neural networks.

更新日期：2020-01-10
• arXiv.cs.NE Pub Date : 2019-11-25
Chnoor M. Rahman; Tarik A. Rashid

One of the most recently developed heuristic optimization algorithms is dragonfly by Mirjalili. Dragonfly algorithm has shown its ability to optimizing different real world problems. It has three variants. In this work, an overview of the algorithm and its variants is presented. Moreover, the hybridization versions of the algorithm are discussed. Furthermore, the results of the applications that utilized dragonfly algorithm in applied science are offered in the following area: Machine Learning, Image Processing, Wireless, and Networking. It is then compared with some other metaheuristic algorithms. In addition, the algorithm is tested on the CEC-C06 2019 benchmark functions. The results prove that the algorithm has great exploration ability and its convergence rate is better than other algorithms in the literature, such as PSO and GA. In general, in this survey the strong and weak points of the algorithm are discussed. Furthermore, some future works that will help in improving the algorithm's weak points are recommended. This study is conducted with the hope of offering beneficial information about dragonfly algorithm to the researchers who want to study the algorithm.

更新日期：2020-01-09
• arXiv.cs.NE Pub Date : 2019-07-03
Xiaolong Ma; Sheng Lin; Shaokai Ye; Zhezhi He; Linfeng Zhang; Geng Yuan; Sia Huat Tan; Zhengang Li; Deliang Fan; Xuehai Qian; Xue Lin; Kaisheng Ma; Yanzhi Wang

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: (i) it achieves 348x, 36x, and 8x overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss; (ii) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that non-structured pruning is considered harmful. We urge the community not to continue the DNN inference acceleration for non-structured sparsity.

更新日期：2020-01-09
• arXiv.cs.NE Pub Date : 2019-10-21
Johannes Müller

Residual networks (ResNets) are a deep learning architecture with the recursive structure $x_{k+1} = x_k + R_k(x_k)$ where $R_k$ is a neural network and the copying of the input $x_k$ is called a skip connection. This structure can be seen as the explicit Euler discretisation of an associated ordinary differential equation. We use this interpretation to show that by simultaneously increasing the number of skip connection as well as the expressivity of the networks $R_k$ the flow of an arbitrary right hand side $f\in L^1\left( I; \mathcal C_b^{0, 1}(\mathbb R^d; \mathbb R^d)\right)$ can be approximated uniformly by deep ReLU ResNets on compact sets. Further, we derive estimates on the number of parameters needed to do this up to a prescribed accuracy under temporal regularity assumptions. Finally, we discuss the possibility of using ResNets for diffeomorphic matching problems and propose some next steps in the theoretical foundation of this approach.

更新日期：2020-01-09
• arXiv.cs.NE Pub Date : 2020-01-06
Zhong Meng; Yashesh Gaur; Jinyu Li; Yifan Gong

Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.

更新日期：2020-01-08
• arXiv.cs.NE Pub Date : 2020-01-06
Zhong Meng; Jinyu Li; Yashesh Gaur; Yifan Gong

Teacher-student (T/S) has shown to be effective for domain adaptation of deep neural network acoustic models in hybrid speech recognition systems. In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance. To further improve T/S learning with the help of ground-truth labels, we propose adaptive T/S (AT/S) learning. Instead of conditionally choosing from either the teacher's soft token posteriors or the one-hot ground-truth label, in AT/S, the student always learns from both the teacher and the ground truth with a pair of adaptive weights assigned to the soft and one-hot labels quantifying the confidence on each of the knowledge sources. The confidence scores are dynamically estimated at each decoder step as a function of the soft and one-hot labels. With 3400 hours parallel close-talk and far-field Microsoft Cortana data for domain adaptation, T/S and AT/S achieve 6.3% and 10.3% relative word error rate improvement over a strong E2E model trained with the same amount of far-field data.

更新日期：2020-01-08
• arXiv.cs.NE Pub Date : 2020-01-07
Xiaofeng Zhu; Feng Liu; Goce Trajcevski; Dingding Wang

Training a neural network model can be a lifelong learning process and is a computationally intensive one. A severe adverse effect that may occur in deep neural network models is that they can suffer from catastrophic forgetting during retraining on new data. To avoid such disruptions in the continuous learning, one appealing property is the additive nature of ensemble models. In this paper, we propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the catastrophic forgetting problem in tuning pre-trained neural network models.

更新日期：2020-01-08
• arXiv.cs.NE Pub Date : 2020-01-07
Andrei Velichko; Maksim Belyaev; Vadim Putrolaynen; Alexander Pergament; Valentin Perminov

In the present paper, we report on the switching dynamics of both single and coupled VO2-based oscillators, with resistive and capacitive coupling, and explore the capability of their application in oscillatory neural networks. Based on these results, we further select an adequate SPICE model to describe the modes of operation of coupled oscillator circuits. Physical mechanisms influencing the time of forward and reverse electrical switching, that determine the applicability limits of the proposed model, are identified. For the resistive coupling, it is shown that synchronization takes place at a certain value of the coupling resistance, though it is unstable and a synchronization failure occurs periodically. For the capacitive coupling, two synchronization modes, with weak and strong coupling, are found. The transition between these modes is accompanied by chaotic oscillations. A decrease in the width of the spectrum harmonics in the weak-coupling mode, and its increase in the strong-coupling one, is detected. The dependences of frequencies and phase differences of the coupled oscillatory circuits on the coupling capacitance are found. Examples of operation of coupled VO2 oscillators as a central pattern generator are demonstrated.

更新日期：2020-01-08
• arXiv.cs.NE Pub Date : 2019-11-24
Patrik Christen; Olivier Del Fabbro

As a discipline cybernetics has a long and rich history. In its first generation it not only had a worldwide span, in the area of computer modelling, for example, its proponents such as John von Neumann, Stanislaw Ulam, Warren McCulloch and Walter Pitts, also came up with models and methods such as cellular automata and artificial neural networks, which are still the foundation of most modern modelling approaches. At the same time, cybernetics also got the attention of philosophers, such as the Frenchman Gilbert Simondon, who made use of cybernetical concepts in order to establish a metaphysics and a natural philosophy of individuation, giving cybernetics thereby a philosophical interpretation, which he baptised allagmatic. In this paper, we emphasise this allagmatic theory by showing how Simondon's philosophical concepts can be used to formulate a generic computer model or metamodel for complex systems modelling and its implementation in program code, according to generic programming. We also present how the developed allagmatic metamodel is capable of building simple cellular automata and artificial neural networks.

更新日期：2020-01-08
• arXiv.cs.NE Pub Date : 2019-05-29
Sjoerd van Steenkiste; Francesco Locatello; Jürgen Schmidhuber; Olivier Bachem

A disentangled representation encodes information about the salient factors of variation in the data independently. Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks. Using two new tasks similar to Raven's Progressive Matrices, we evaluate the usefulness of the representations learned by 360 state-of-the-art unsupervised disentanglement models. Based on these representations, we train 3600 abstract reasoning models and observe that disentangled representations do in fact lead to better down-stream performance. In particular, they enable quicker learning using fewer samples.

更新日期：2020-01-08
Contents have been reproduced by permission of the publishers.

down
wechat
bug