• arXiv.cs.LG Pub Date : 2020-01-15
Kancharagunta Kishan Babu; Shiv Ram Dubey

Image-to-image transformation is a kind of problem, where the input image from one visual representation is transformed into the output image of another visual representation. Since 2014, Generative Adversarial Networks (GANs) have facilitated a new direction to tackle this problem by introducing the generator and the discriminator networks in its architecture. Many recent works, like Pix2Pix, CycleGAN, DualGAN, PS2MAN and CSGAN handled this problem with the required generator and discriminator networks and choice of the different losses that are used in the objective functions. In spite of these works, still there is a gap to fill in terms of both the quality of the images generated that should look more realistic and as much as close to the ground truth images. In this work, we introduce a new Image-to-Image Transformation network named Cyclic Discriminative Generative Adversarial Networks (CDGAN) that fills the above mentioned gaps. The proposed CDGAN generates high quality and more realistic images by incorporating the additional discriminator networks for cycled images in addition to the original architecture of the CycleGAN. To demonstrate the performance of the proposed CDGAN, it is tested over three different baseline image-to-image transformation datasets. The quantitative metrics such as pixel-wise similarity, structural level similarity and perceptual level similarity are used to judge the performance. Moreover, the qualitative results are also analyzed and compared with the state-of-the-art methods. The proposed CDGAN method clearly outperformed all the state-of-the-art methods when compared over the three baseline Image-to-Image transformation datasets.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Li Cheng; Yijie Wang; Xinwang Liu; Bin Li

Feature selection places an important role in improving the performance of outlier detection, especially for noisy data. Existing methods usually perform feature selection and outlier scoring separately, which would select feature subsets that may not optimally serve for outlier detection, leading to unsatisfying performance. In this paper, we propose an outlier detection ensemble framework with embedded feature selection (ODEFS), to address this issue. Specifically, for each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation to learn feature subsets that are tailored for the outlier detection method. Moreover, we adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection, which is helpful to improve the reliability of the training set. After that, we design an alternate algorithm with proved convergence to solve the resultant optimization problem. In addition, we analyze the generalization error bound of the proposed framework, which provides theoretical guarantee on the method and insightful practical guidance. Comprehensive experimental results on 12 real-world datasets from diverse domains validate the superiority of the proposed ODEFS.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Andrea Valenti; Antonio Carta; Davide Bacciu

We address the challenging open problem of learning an effective latent space for symbolic music data in generative music modeling. We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information concerning music genre and style. Through the paper, we show how Gaussian mixtures taking into account music metadata information can be used as an effective prior for the autoencoder latent space, introducing the first Music Adversarial Autoencoder (MusAE). The empirical analysis on a large scale benchmark shows that our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders. It is also able to create realistic interpolations between two musical sequences, smoothly changing the dynamics of the different tracks. Experiments show that the model can organise its latent space accordingly to low-level properties of the musical pieces, as well as to embed into the latent variables the high-level genre information injected from the prior distribution to increase its overall performance. This allows us to perform changes to the generated pieces in a principled way.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Pinkesh Badjatiya; Manish Gupta; Vasudeva Varma

With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ~96k and a Twitter dataset of size ~24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Max Hopkins; Daniel Kane; Shachar Lovett; Gaurav Mahajan

With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice. We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label in the hope of exponentially increasing efficiency. By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise. We further provide algorithms for a generalization of the popular Tsybakov low noise condition, and show how comparisons provide a strong reliability guarantee that is often impractical or impossible with only labels - returning a classifier that makes no errors with high probability.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
David M. Burns; Cari M. Whyne

A significant challenge for a supervised learning approach to inertial human activity recognition is the heterogeneity of data between individual users, resulting in very poor performance of impersonal algorithms for some subjects. We present an approach to personalized activity recognition based on deep embeddings derived from a fully convolutional neural network. We experiment with both categorical cross entropy loss and triplet loss for training the embedding, and describe a novel triplet loss function based on subject triplets. We evaluate these methods on three publicly available inertial human activity recognition data sets (MHEALTH, WISDM, and SPAR) comparing classification accuracy, out-of-distribution activity detection, and embedding generalization to new activities. The novel subject triplet loss provides the best performance overall, and all personalized deep embeddings out-perform our baseline personalized engineered feature embedding and an impersonal fully convolutional neural network classifier.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Huy Phan; Ian V. McLoughlin; Lam Pham; Oliver Y. Chén; Philipp Koch; Maarten De Vos; Alfred Mertins

Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. Most, if not all, existing speech enhancement GANs (SEGANs) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose two novel SEGAN frameworks, iterated SEGAN (ISEGAN) and deep SEGAN (DSEGAN). In the two proposed frameworks, the GAN architectures are composed of multiple generators that are chained to accomplish multiple-stage enhancement mapping which gradually refines the noisy input signals in stage-wise fashion. On the one hand, ISEGAN's generators share their parameters to learn an iterative enhancement mapping. On the other hand, DSEGAN's generators share a common architecture but their parameters are independent; as a result, different enhancement mappings are learned at different stages of the network. We empirically demonstrate favorable results obtained by the proposed ISEGAN and DSEGAN frameworks over the vanilla SEGAN. The source code is available at http://github.com/pquochuy/idsegan.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Conghui Tan; Yuqiu Qian; Shiqian Ma; Tong Zhang

Dual averaging-type methods are widely used in industrial machine learning applications due to their ability to promoting solution structure (e.g., sparsity) efficiently. In this paper, we propose a novel accelerated dual-averaging primal-dual algorithm for minimizing a composite convex function. We also derive a stochastic version of the proposed method which solves empirical risk minimization, and its advantages on handling sparse data are demonstrated both theoretically and empirically.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Laura Ruis; Mitchell Stern; Julia Proskurnia; William Chan

We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation. The model consists of two phases that are executed iteratively, 1) an insertion phase and 2) a deletion phase. The insertion phase parameterizes a distribution of insertions on the current output hypothesis, while the deletion phase parameterizes a distribution of deletions over the current output hypothesis. The training method is a principled and simple algorithm, where the deletion model obtains its signal directly on-policy from the insertion model output. We demonstrate the effectiveness of our Insertion-Deletion Transformer on synthetic translation tasks, obtaining significant BLEU score improvement over an insertion-only model.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Vinay Kumar Verma; Pravendra Singh; Vinay P. Namboodiri; Piyush Rai

We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can therefore be pruned. The pruner network has the same architecture as the original network except that it has a multitask/multi-output last layer containing binary-valued outputs (one per filter), which indicate which filters have to be pruned. The pruner's goal is to minimize the number of filters from the original network by assigning zero weights to the corresponding output feature-maps. In contrast to most of the existing methods, instead of relying on iterative pruning, our approach can prune the network (original network) in one go and, moreover, does not require specifying the degree of pruning for each layer (and can learn it instead). The compressed model produced by our approach is generic and does not need any special hardware/software support. Moreover, augmenting with other methods such as knowledge distillation, quantization, and connection pruning can increase the degree of compression for the proposed approach. We show the efficacy of our proposed approach for classification and object detection tasks.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Esra Kaya; İsmail Sarıtaş; Ilker Ali Ozkan

In this study, a supervised retina blood vessel segmentation process was performed on the green channel of the RGB image using artificial neural network (ANN). The green channel is preferred because the retinal vessel structures can be distinguished most clearly from the green channel of the RGB image. The study was performed using 20 images in the DRIVE data set which is one of the most common retina data sets known. The images went through some preprocessing stages like contrastlimited adaptive histogram equalization (CLAHE), color intensity adjustment, morphological operations and median and Gaussian filtering to obtain a good segmentation. Retinal vessel structures were highlighted with top-hat and bot-hat morphological operations and converted to binary image by using global thresholding. Then, the network was trained by the binary version of the images specified as training images in the dataset and the targets are the images segmented manually by a specialist. The average segmentation accuracy for 20 images was found as 0.9492.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Haik Manukian; Yan Ru Pei; Sean R. B. Bearden; Massimiliano Di Ventra

Restricted Boltzmann machines (RBMs) are a powerful class of generative models, but their training requires computing a gradient that, unlike supervised backpropagation on typical loss functions, is notoriously difficult even to approximate. Here, we show that properly combining standard gradient updates with an off-gradient direction, constructed from samples of the RBM ground state (mode), improves their training dramatically over traditional gradient methods. This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence). Along with the proofs of stability and convergence of this method, we also demonstrate its efficacy on synthetic datasets where we can compute KL divergences exactly, as well as on a larger machine learning standard, MNIST. The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures such as deep, convolutional and unrestricted Boltzmann machines.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Shervin Minaee; Yuri Boykov; Fatih Porikli; Antonio Plaza; Nasser Kehtarnavaz; Demetri Terzopoulos

Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Nimar S. Arora; Nazanin Khosravani Tehrani; Kinjal Divesh Shah; Michael Tingley; Yucen Lily Li; Narjes Torabi; David Noursi; Sepehr Akhavan Masouleh; Eric Lippert; Erik Meijer

Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step. Structured relational models are a good candidate for this style of inference. In the single-site context, second order methods become feasible because the typical cubic costs associated with these methods is now restricted to the dimension of each coordinate. Our work, which we call Newtonian Monte Carlo (NMC), is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density to determine a suitable proposal density at each point. Existing first order gradient-based methods suffer from the problem of determining an appropriate step size. Too small a step size and it will take a large number of steps to converge, while a very large step size will cause it to overshoot the high density region. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension. However, our objective is to find a parameterized proposal density rather than the maxima. As a further improvement on existing first and second order methods, we show that random variables with constrained supports don't need to be transformed before taking a gradient step. We demonstrate the efficiency of NMC on a number of different domains. For statistical models where the prior is conjugate to the likelihood, our method recovers the posterior quite trivially in one step. However, we also show results on fairly large non-conjugate models, where NMC performs better than adaptive first order methods such as NUTS or other inexact scalable inference methods such as Stochastic Variational Inference or bootstrapping.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Jan Brabec; Tomáš Komárek; Vojtěch Franc; Lukáš Machlica

Many real-world classification problems are significantly class-imbalanced to detriment of the class of interest. The standard set of proper evaluation metrics is well-known but the usual assumption is that the test dataset imbalance equals the real-world imbalance. In practice, this assumption is often broken for various reasons. The reported results are then often too optimistic and may lead to wrong conclusions about industrial impact and suitability of proposed techniques. We introduce methods focusing on evaluation under non-constant class imbalance. We show that not only the absolute values of commonly used metrics, but even the order of classifiers in relation to the evaluation metric used is affected by the change of the imbalance rate. Finally, we demonstrate that using subsampling in order to get a test dataset with class imbalance equal to the one observed in the wild is not necessary, and eventually can lead to significant errors in classifier's performance estimate.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-14
Oliver Urbann; Simon Camphausen; Arne Moos; Ingmar Schwarz; Sören Kerner; Maximilian Otten

Inference of Convolutional Neural Networks in time critical applications usually requires a GPU. In robotics or embedded devices these are often not available due to energy, space and cost constraints. Furthermore, installation of a deep learning framework or even a native compiler on the target platform is not possible. This paper presents a neural network code generator (NNCG) that generates from a trained CNN a plain ANSI C code file that encapsulates the inference in single a function. It can easily be included in existing projects and due to lack of dependencies, cross compilation is usually possible. Additionally, the code generation is optimized based on the known trained CNN and target platform following four design principles. The system is evaluated utilizing small CNN designed for this application. Compared to TensorFlow XLA and Glow speed-ups of up to 11.81 can be shown and even GPUs are outperformed regarding latency.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-13
Michael Hind; Dennis Wei; Yunfeng Zhang

Many proposed methods for explaining machine learning predictions are in fact challenging to understand for nontechnical consumers. This paper builds upon an alternative consumer-driven approach called TED that asks for explanations to be provided in training data, along with target labels. Using semi-synthetic data from credit approval and employee retention applications, experiments are conducted to investigate some practical considerations with TED, including its performance with different classification algorithms, varying numbers of explanations, and variability in explanations. A new algorithm is proposed to handle the case where some training examples do not have explanations. Our results show that TED is robust to increasing numbers of explanations, noisy explanations, and large fractions of missing explanations, thus making advances toward its practical deployment.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-13
Dou Goodman; Hao Xin; Wang Yang; Wu Yuesheng; Xiong Junfeng; Zhang Huan

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Avinava Dubey; Michael Minyi Zhang; Eric P. Xing; Sinead A. Williamson

Bayesian nonparametric (BNP) models provide elegant methods for discovering underlying latent features within a data set, but inference in such models can be slow. We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures. We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components. We then select different inference algorithms for the two components: uncollapsed samplers mix well on the finite measure, while collapsed samplers mix well on the infinite, sparsely occupied tail. The resulting hybrid algorithm can be applied to a wide class of models, and can be easily distributed to allow scalable inference without sacrificing asymptotic convergence guarantees.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Varun Embar; Sriram Srinivasan; Lise Getoor

Aggregate network properties such as cluster cohesion and the number of bridge nodes can be used to glean insights about a network's community structure, spread of influence and the resilience of the network to faults. Efficiently computing network properties when the network is fully observed has received significant attention (Wasserman and Faust 1994; Cook and Holder 2006), however the problem of computing aggregate network properties when there is missing data attributes has received little attention. Computing these properties for networks with missing attributes involves performing inference over the network. Statistical relational learning (SRL) and graph neural networks (GNNs) are two classes of machine learning approaches well suited for inferring missing attributes in a graph. In this paper, we study the effectiveness of these approaches in estimating aggregate properties on networks with missing attributes. We compare two SRL approaches and three GNNs. For these approaches we estimate these properties using point estimates such as MAP and mean. For SRL-based approaches that can infer a joint distribution over the missing attributes, we also estimate these properties as an expectation over the distribution. To compute the expectation tractably for probabilistic soft logic, one of the SRL approaches that we study, we introduce a novel sampling framework. In the experimental evaluation, using three benchmark datasets, we show that SRL-based approaches tend to outperform GNN-based approaches both in computing aggregate properties and predictive accuracy. Specifically, we show that estimating the aggregate properties as an expectation over the joint distribution outperforms point estimates.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Toshitaka Hayashi; Hamido Fujita

Supervised learning requires a sufficient training dataset which includes all label. However, there are cases that some class is not in the training data. Zero-Shot Learning (ZSL) is the task of predicting class that is not in the training data(target class). The existing ZSL method is done for image data. However, the zero-shot problem should happen to every data type. Hence, considering ZSL for other data types is required. In this paper, we propose the cluster-based ZSL method, which is a baseline method for multivariate binary classification problems. The proposed method is based on the assumption that if data is far from training data, the data is considered as target class. In training, clustering is done for training data. In prediction, the data is determined belonging to a cluster or not. If data does not belong to a cluster, the data is predicted as target class. The proposed method is evaluated and demonstrated using the KEEL dataset.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Haitao Xu; Brendan McCane; Lech Szymanski; Craig Atkinson

We show that reinforcement learning agents that learn by surprise (surprisal) get stuck at abrupt environmental transition boundaries because these transitions are difficult to learn. We propose a counter-intuitive solution that we call Mutual Information Minimising Exploration (MIME) where an agent learns a latent representation of the environment without trying to predict the future states. We show that our agent performs significantly better over sharp transition boundaries while matching the performance of surprisal driven agents elsewhere. In particular, we show state-of-the-art performance on difficult learning games such as Gravitar, Montezuma's Revenge and Doom.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Saeed Amirgholipour; Xiangjian He; Wenjing Jia; Dadong Wang; Lei Liu

Crowd counting, i.e., estimating the number of people in a crowded area, has attracted much interest in the research community. Although many attempts have been reported, crowd counting remains an open real-world problem due to the vast scale variations in crowd density within the interested area, and severe occlusion among the crowd. In this paper, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, that leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilizes these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. We also address the variation of crowdedness levels among different images with an exclusive Density-Aware Decoder (DAD). For this purpose, a classifier evaluates the density level of the input features and then passes them to the corresponding high and low crowded DAD modules. Finally, we generate an overall density map by considering the summation of low and high crowded density maps as spatial attention. Meanwhile, we employ two losses to create a precise density map for the input scene. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed PDANet in terms of the accuracy of counting and generated density maps over the well-known state of the arts.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Xiaoxiao Li; Yufeng Gu; Nicha Dvornek; Lawrence Staib; Pamela Ventola; James S. Duncan

Deep learning models have shown their advantage in many different tasks, including neuroimage analysis. However, to effectively train a high-quality deep learning model, the aggregation of a significant amount of patient information is required. The time and cost for acquisition and annotation in assembling, for example, large fMRI datasets make it difficult to acquire large numbers at a single site. However, due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions. Federated learning allows for population-level models to be trained without centralizing entities' data by transmitting the global model to local entities, training the model locally, and then averaging the gradients or weights in the global model. However, some studies suggest that private information can be recovered from the model gradients or weights. In this work, we address the problem of multi-site fMRI classification with a privacy-preserving strategy. To solve the problem, we propose a federated learning approach, where a decentralized iterative optimization algorithm is implemented and shared local model weights are altered by a randomization mechanism. Considering the systemic differences of fMRI distributions from different sites, we further propose two domain adaptation methods in this federated learning formulation. We investigate various practical aspects of federated model optimization and compare federated learning with alternative training strategies. Overall, our results demonstrate that it is promising to utilize multi-site data without data sharing to boost neuroimage analysis performance and find reliable disease-related biomarkers. Our proposed pipeline can be generalized to other privacy-sensitive medical data analysis problems.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Hongwei Xie; Jiafang Wang; Baitao Shao; Jian Gu; Mingyang Li

Online hand gesture recognition (HGR) techniques are essential in augmented reality (AR) applications for enabling natural human-to-computer interaction and communication. In recent years, the consumer market for low-cost AR devices has been rapidly growing, while the technology maturity in this domain is still limited. Those devices are typical of low prices, limited memory, and resource-constrained computational units, which makes online HGR a challenging problem. To tackle this problem, we propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power. We also show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments. To achieve our goal, we first propose a cascaded multi-task convolutional neural network (CNN) to simultaneously predict probabilities of hand detection and regress hand keypoint locations online. We show that, with the proposed cascaded architecture design, false-positive estimates can be largely eliminated. Additionally, an associated mapping approach is introduced to track the hand trace via the predicted locations, which addresses the interference of multi-handedness. Subsequently, we propose a trace sequence neural network (TraceSeqNN) to recognize the hand gesture by exploiting the motion features of the tracked trace. Finally, we provide a variety of experimental results to show that the proposed framework is able to achieve state-of-the-art accuracy with significantly reduced computational cost, which are the key properties for enabling real-time applications in low-cost commercial devices such as mobile devices and AR/VR headsets.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Pegah Jandaghi; Jay Pujara

In many scenarios, humans prefer a text-based representation of quantitative data over numerical, tabular, or graphical representations. The attractiveness of textual summaries for complex data has inspired research on data-to-text systems. While there are several data-to-text tools for time series, few of them try to mimic how humans summarize for time series. In this paper, we propose a model to create human-like text descriptions for time series. Our system finds patterns in time series data and ranks these patterns based on empirical observations of human behavior using utility estimation. Our proposed utility estimation model is a Bayesian network capturing interdependencies between different patterns. We describe the learning steps for this network and introduce baselines along with their performance for each step. The output of our system is a natural language description of time series that attempts to match a human's summary of the same data.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
T. Serizawa; H. Fujita

Convolutional neural network (CNN) is one of the most frequently used deep learning techniques. Various forms of models have been proposed and improved for learning at CNN. When learning with CNN, it is necessary to determine the optimal hyperparameters. However, the number of hyperparameters is so large that it is difficult to do it manually, so much research has been done on automation. A method that uses metaheuristic algorithms is attracting attention in research on hyperparameter optimization. Metaheuristic algorithms are naturally inspired and include evolution strategies, genetic algorithms, antcolony optimization and particle swarm optimization. In particular, particle swarm optimization converges faster than genetic algorithms, and various models have been proposed. In this paper, we propose CNN hyperparameter optimization with linearly decreasing weight particle swarm optimization (LDWPSO). In the experiment, the MNIST data set and CIFAR-10 data set, which are often used as benchmark data sets, are used. By optimizing CNN hyperparameters with LDWPSO, learning the MNIST and CIFAR-10 datasets, we compare the accuracy with a standard CNN based on LeNet-5. As a result, when using the MNIST dataset, the baseline CNN is 94.02% at the 5th epoch, compared to 98.95% for LDWPSO CNN, which improves accuracy. When using the CIFAR-10 dataset, the Baseline CNN is 28.07% at the 10th epoch, compared to 69.37% for the LDWPSO CNN, which greatly improves accuracy.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Léopold Cambier; Anahita Bhiwandiwalla; Ting Gong; Mehran Nekuii; Oguz H Elibol; Hanlin Tang

Training with larger number of parameters while keeping fast iterations is an increasingly adopted strategy and trend for developing better performing Deep Neural Network (DNN) models. This necessitates increased memory footprint and computational requirements for training. Here we introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers. Reduced bit precision allows for a larger effective memory and increased computational speed. We name this method Shifted and Squeezed FP8 (S2FP8). We show that, unlike previous 8-bit precision training methods, the proposed method works out-of-the-box for representative models: ResNet-50, Transformer and NCF. The method can maintain model accuracy without requiring fine-tuning loss scaling parameters or keeping certain layers in single precision. We introduce two learnable statistics of the DNN tensors - shifted and squeezed factors that are used to optimally adjust the range of the tensors in 8-bits, thus minimizing the loss in information due to quantization.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Ji Hyung Jung; Hye Won Chung; Ji Oon Lee

We study the statistical decision process of detecting the presence of signal from a 'signal+noise' type matrix model with an additive Wigner noise. We derive the error of the likelihood ratio test, which minimizes the sum of the Type-I and Type-II errors, under the Gaussian noise for the signal matrix with arbitrary finite rank. We propose a hypothesis test based on the linear spectral statistics of the data matrix, which is optimal and does not depend on the distribution of the signal or the noise. We also introduce a test for rank estimation that does not require the prior information on the rank of the signal.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Youchuan Hu; Le Yan; Tingting Hang; Jun Feng

Stream-flow forecasting for small rivers has always been of great importance, yet comparatively challenging due to the special features of rivers with smaller volume. Artificial Intelligence (AI) methods have been employed in this area for long, but improvement of forecast quality is still on the way. In this paper, we tried to provide a new method to do the forecast using the Long-Short Term Memory (LSTM) deep learning model, which aims in the field of time-series data. Utilizing LSTM, we collected the stream flow data from one hydrologic station in Tunxi, China, and precipitation data from 11 rainfall stations around to forecast the stream flow data from that hydrologic station 6 hours in the future. We evaluated the prediction results using three criteria: root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R^2). By comparing LSTM's prediction with predictions of Support Vector Regression (SVR) and Multilayer Perceptions (MLP) models, we showed that LSTM has better performance, achieving RMSE of 82.007, MAE of 27.752, and R^2 of 0.970. We also did extended experiments on LSTM model, discussing influence factors of its performance.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Li Ye; Yishi Lin; Hong Xie; John C. S. Lui

A fundamental question for companies is: How to make good decisions with the increasing amount of logged data?. Currently, companies are doing online tests (e.g. A/B tests) before making decisions. However, online tests can be expensive because testing inferior decisions hurt users' experiences. On the other hand, offline causal inference analyzes logged data alone to make decisions, but once a wrong decision is made by the offline causal inference, this wrong decision will continuously to hurt all users' experience. In this paper, we unify offline causal inference and online bandit learning to make the right decision. Our framework is flexible to incorporate various causal inference methods (e.g. matching, weighting) and online bandit methods (e.g. UCB, LinUCB). For these novel combination of algorithms, we derive theoretical bounds on the decision maker's "regret" compared to its optimal decision. We also derive the first regret bound for forest-based online bandit algorithms. Experiments on synthetic data show that our algorithms outperform methods that use only the logged data or only the online feedbacks.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Linh Kästner; Daniel Dimitrov; Jens Lambrecht

Augmented Reality has been subject to various integration efforts within industries due to its ability to enhance human machine interaction and understanding. Neural networks have achieved remarkable results in areas of computer vision, which bear great potential to assist and facilitate an enhanced Augmented Reality experience. However, most neural networks are computationally intensive and demand huge processing power thus, are not suitable for deployment on Augmented Reality devices. In this work we propose a method to deploy state of the art neural networks for real time 3D object localization on augmented reality devices. As a result, we provide a more automated method of calibrating the AR devices with mobile robotic systems. To accelerate the calibration process and enhance user experience, we focus on fast 2D detection approaches which are extracting the 3D pose of the object fast and accurately by using only 2D input. The results are implemented into an Augmented Reality application for intuitive robot control and sensor data visualization. For the 6D annotation of 2D images, we developed an annotation tool, which is, to our knowledge, the first open source tool to be available. We achieve feasible results which are generally applicable to any AR device thus making this work promising for further research in combining high demanding neural networks with Internet of Things devices.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Guadalupe Gonzalez; Shunwang Gong; Ivan Laponogov; Kirill Veselkov; Michael Bronstein

Recent research efforts have shown the possibility to discover anticancer drug-like molecules in food from their effect on protein-protein interaction networks, opening a potential pathway to disease-beating diet design. We formulate this task as a graph classification problem on which graph neural networks (GNNs) have achieved state-of-the-art results. However, GNNs are difficult to train on sparse low-dimensional features according to our empirical evidence. Here, we present graph augmented features, integrating graph structural information and raw node attributes with varying ratios, to ease the training of networks. We further introduce a novel neural network architecture on graphs, the Graph Attentional Autoencoder (GAA) to predict food compounds with anticancer properties based on perturbed protein networks. We demonstrate that the method outperforms the baseline approach and state-of-the-art graph classification models in this task.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Raju Ram; Sabine Müller; Franz-Josef Pfreundt; Nicolas R. Gauger; Janis Keuper

Most machine learning methods require careful selection of hyper-parameters in order to train a high performing model with good generalization abilities. Hence, several automatic selection algorithms have been introduced to overcome tedious manual (try and error) tuning of these parameters. Due to its very high sample efficiency, Bayesian Optimization over a Gaussian Processes modeling of the parameter space has become the method of choice. Unfortunately, this approach suffers from a cubic compute complexity due to underlying Cholesky factorization, which makes it very hard to be scaled beyond a small number of sampling steps. In this paper, we present a novel, highly accurate approximation of the underlying Gaussian Process. Reducing its computational complexity from cubic to quadratic allows an efficient strong scaling of Bayesian Optimization while outperforming the previous approach regarding optimization accuracy. The first experiments show speedups of a factor of 162 in single node and further speed up by a factor of 5 in a parallel environment.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Diego García-Gil; Johan Holmberg; Salvador García; Ning Xiong; Francisco Herrera

Big Data scenarios pose a new challenge to traditional data mining algorithms, since they are not prepared to work with such amount of data. Smart Data refers to data of enough quality to improve the outcome from a data mining algorithm. Existing data mining algorithms unability to handle Big Datasets prevents the transition from Big to Smart Data. Automation in data acquisition that characterizes Big Data also brings some problems, such as differences in data size per class. This will lead classifiers to lean towards the most represented classes. This problem is known as imbalanced data distribution, where one class is underrepresented in the dataset. Ensembles of classifiers are machine learning methods that improve the performance of a single base classifier by the combination of several of them. Ensembles are not exempt from the imbalanced classification problem. To deal with this issue, the ensemble method have to be designed specifically. In this paper, a data preprocessing ensemble for imbalanced Big Data classification is presented, with focus on two-class problems. Experiments carried out in 21 Big Datasets have proved that our ensemble classifier outperforms classic machine learning models with an added data balancing method, such as Random Forests.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Tuyen Trung Truong

Our main result concerns the following condition: {\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever $\{x_n\}$ weakly converges to $x$ and $\lim _{n\rightarrow\infty}||\nabla f(x_n)||=0$, then $\nabla f(x)=0$. We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a Hilbert space. {\bf Theorem.} Let $X$ be a reflexive, complete Banach space and $f:X\rightarrow \mathbb{R}$ be a $C^2$ function which satisfies Condition C. Moreover, we assume that for every bounded set $S\subset X$, then $\sup _{x\in S}||\nabla ^2f(x)||<\infty$. We choose a random point $x_0\in X$ and construct by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters $\alpha ,\beta ,\delta _0$, see later for details) the sequence $x_{n+1}=x_n-\delta (x_n)\nabla f(x_n)$. Then we have: 1) Every cluster point of $\{x_n\}$, in the {\bf weak} topology, is a critical point of $f$. 2) Either $\lim _{n\rightarrow\infty}f(x_n)=-\infty$ or $\lim _{n\rightarrow\infty}||x_{n+1}-x_n||=0$. 3) Here we work with the weak topology. Let $\mathcal{C}$ be the set of critical points of $f$. Assume that $\mathcal{C}$ has a bounded component $A$. Let $\mathcal{B}$ be the set of cluster points of $\{x_n\}$. If $\mathcal{B}\cap A\not= \emptyset$, then $\mathcal{B}\subset A$ and $\mathcal{B}$ is connected. 4) Assume that $f$ has at most countably many saddle points. Then for generic choices of $\alpha ,\beta ,\delta _0$ and the initial point $x_0$, if the sequence $\{x_n\}$ converges - in the {\bf weak} topology, then the limit point cannot be a saddle point.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Yichen Zhang; Chen Chen; Guodong Liu; Tianqi Hong; Feng Qiu

In this paper, we introduce a deep learning aided constraint encoding method to tackle the frequency-constraint microgrid scheduling problem. The nonlinear function between system operating condition and frequency nadir is approximated by using a neural network, which admits an exact mixed-integer formulation (MIP). This formulation is then integrated with the scheduling problem to encode the frequency constraint. With the stronger representation power of the neural network, the resulting commands can ensure adequate frequency response in a realistic setting in addition to islanding success. The proposed method is validated on a modified 33-node system. Successful islanding with a secure response is simulated under the scheduled commands using a detailed three-phase model in Simulink. The advantages of our model are particularly remarkable when the inertia emulation functions from wind turbine generators are considered.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Jian Huang; Yuling Jiao; Lican Kang; Jin Liu; Yanyan Liu; Xiliang Lu

Feature selection is important for modeling high-dimensional data, where the number of variables can be much larger than the sample size. In this paper, we develop a support detection and root finding procedure to learn the high dimensional sparse generalized linear models and denote this method by GSDAR. Based on the KKT condition for $\ell_0$-penalized maximum likelihood estimations, GSDAR generates a sequence of estimators iteratively. Under some restricted invertibility conditions on the maximum likelihood function and sparsity assumption on the target coefficients, the errors of the proposed estimate decays exponentially to the optimal order. Moreover, the oracle estimator can be recovered if the target signal is stronger than the detectable level. We conduct simulations and real data analysis to illustrate the advantages of our proposed method over several existing methods, including Lasso and MCP.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-08
Georg Hille; Johannes Steffen; Max Dünnwald; Mathias Becker; Sylvia Saalfeld; Klaus Tönnies

This study's objective was to segment spinal metastases in diagnostic MR images using a deep learning-based approach. Segmentation of such lesions can present a pivotal step towards enhanced therapy planning and validation, as well as intervention support during minimally invasive and image-guided surgeries like radiofrequency ablations. For this purpose, we used a U-Net like architecture trained with 40 clinical cases including both, lytic and sclerotic lesion types and various MR sequences. Our proposed method was evaluated with regards to various factors influencing the segmentation quality, e.g. the used MR sequences and the input dimension. We quantitatively assessed our experiments using Dice coefficients, sensitivity and specificity rates. Compared to expertly annotated lesion segmentations, the experiments yielded promising results with average Dice scores up to 77.6% and mean sensitivity rates up to 78.9%. To our best knowledge, our proposed study is one of the first to tackle this particular issue, which limits direct comparability with related works. In respect to similar deep learning-based lesion segmentations, e.g. in liver MR images or spinal CT images, our experiments showed similar or in some respects superior segmentation quality. Overall, our automatic approach can provide almost expert-like segmentation accuracy in this challenging and ambitious task.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-14
Gilberto Luis De Conto Junior

Many people are affected by diabetes around the world. This disease may have type 1 and 2. Diabetes brings with it several complications including diabetic retinopathy, which is a disease that if not treated correctly can lead to irreversible damage in the patient's vision. The earlier it is detected, the better the chances that the patient will not lose vision. Methods of automating manual procedures are currently in evidence and the diagnostic process for retinopathy is manual with the physician analyzing the patient's retina on the monitor. The practice of image recognition can aid this detection by recognizing Diabetic Retinopathy patterns and comparing it with the patient's retina in diagnosis. This method can also assist in the act of telemedicine, in which people without access to the exam can benefit from the diagnosis provided by the application. The application development took place through convolutional neural networks, which do digital image processing analyzing each image pixel. The use of VGG-16 as a pre-trained model to the application basis was very useful and the final model accuracy was 82%.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-04
German I. Parisi

The robust recognition and assessment of human actions are crucial in human-robot interaction (HRI) domains. While state-of-the-art models of action perception show remarkable results in large-scale action datasets, they mostly lack the flexibility, robustness, and scalability needed to operate in natural HRI scenarios which require the continuous acquisition of sensory information as well as the classification or assessment of human body patterns in real time. In this chapter, I introduce a set of hierarchical models for the learning and recognition of actions from depth maps and RGB images through the use of neural network self-organization. A particularity of these models is the use of growing self-organizing networks that quickly adapt to non-stationary distributions and implement dedicated mechanisms for continual learning from temporally correlated input.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-04
Anandhanarayanan Kamalakannan; Shiva Shankar Ganesan; Govindaraj Rajamanickam

Image segmentation and classification are the two main fundamental steps in pattern recognition. To perform medical image segmentation or classification with deep learning models, it requires training on large image dataset with annotation. The dermoscopy images (ISIC archive) considered for this work does not have ground truth information for lesion segmentation. Performing manual labelling on this dataset is time-consuming. To overcome this issue, self-learning annotation scheme was proposed in the two-stage deep learning algorithm. The two-stage deep learning algorithm consists of U-Net segmentation model with the annotation scheme and CNN classifier model. The annotation scheme uses a K-means clustering algorithm along with merging conditions to achieve initial labelling information for training the U-Net model. The classifier models namely ResNet-50 and LeNet-5 were trained and tested on the image dataset without segmentation for comparison and with the U-Net segmentation for implementing the proposed self-learning Artificial Intelligence (AI) framework. The classification results of the proposed AI framework achieved training accuracy of 93.8% and testing accuracy of 82.42% when compared with the two classifier models directly trained on the input images.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-03
David Noever; Wes Regian; Matt Ciolino; Josh Kalin; Dom Hambrick; Kaye Blankenship

Small satellite constellations provide daily global coverage of the earth's landmass, but image enrichment relies on automating key tasks like change detection or feature searches. For example, to extract text annotations from raw pixels requires two dependent machine learning models, one to analyze the overhead image and the other to generate a descriptive caption. We evaluate seven models on the previously largest benchmark for satellite image captions. We extend the labeled image samples five-fold, then augment, correct and prune the vocabulary to approach a rough min-max (minimum word, maximum description). This outcome compares favorably to previous work with large pre-trained image models but offers a hundred-fold reduction in model size without sacrificing overall accuracy (when measured with log entropy loss). These smaller models provide new deployment opportunities, particularly when pushed to edge processors, on-board satellites, or distributed ground stations. To quantify a caption's descriptiveness, we introduce a novel multi-class confusion or error matrix to score both human-labeled test data and never-labeled images that include bounding box detection but lack full sentence captions. This work suggests future captioning strategies, particularly ones that can enrich the class coverage beyond land use applications and that lessen color-centered and adjacency adjectives ("green", "near", "between", etc.). Many modern language transformers present novel and exploitable models with world knowledge gleaned from training from their vast online corpus. One interesting, but easy example might learn the word association between wind and waves, thus enriching a beach scene with more than just color descriptions that otherwise might be accessed from raw pixels without text annotation.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2019-12-30
Takahiro Suzuki; Shingo Takeshita; Satoshi Ono

This paper proposes Evolutionary Multi-objective Optimization (EMO)-based Adversarial Example (AE) design method that performs under black-box setting. Previous gradient-based methods produce AEs by changing all pixels of a target image, while previous EC-based method changes small number of pixels to produce AEs. Thanks to EMO's property of population based-search, the proposed method produces various types of AEs involving ones locating between AEs generated by the previous two approaches, which helps to know the characteristics of a target model or to know unknown attack patterns. Experimental results showed the potential of the proposed method, e.g., it can generate robust AEs and, with the aid of DCT-based perturbation pattern generation, AEs for high resolution images.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2019-12-28
Cansu Sancaktar; Pablo Lanillos

We present a pixel-based deep Active Inference algorithm (PixelAI) inspired in human body perception and successfully validated in robot body perception and action as a use case. Our algorithm combines the free energy principle from neuroscience, rooted in variational inference, with deep convolutional decoders to scale the algorithm to directly deal with images input and provide online adaptive inference. The approach enables the robot to perform 1) dynamical body estimation of arm using only raw monocular camera images and 2) autonomous reaching to "imagined" arm poses in the visual space. We statistically analyzed the algorithm performance in a simulated and a real Nao robot. Results show how the same algorithm deals with both perception an action, modelled as an inference optimization problem.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2019-12-27
Alberto Rossi; Markus Hagenbuchner; Franco Scarselli; Ah Chung Tsoi

This paper extends the fully recursive perceptron network (FRPN) model for vectorial inputs to include deep convolutional neural networks (CNNs) which can accept multi-dimensional inputs. A FRPN consists of a recursive layer, which, given a fixed input, iteratively computes an equilibrium state. The unfolding realized with this kind of iterative mechanism allows to simulate a deep neural network with any number of layers. The extension of the FRPN to CNN results in an architecture, which we call convolutional-FRPN (C-FRPN), where the convolutional layers are recursive. The method is evaluated on several image classification benchmarks. It is shown that the C-FRPN consistently outperforms standard CNNs having the same number of parameters. The gap in performance is particularly large for small networks, showing that the C-FRPN is a very powerful architecture, since it allows to obtain equivalent performance with fewer parameters when compared with deep CNNs.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2019-12-27
Mingxin Zhao; Li Cheng; Xu Yang; Peng Feng; Liyuan Liu; Nanjian Wu

Infrared small target detection is a key technique in infrared search and tracking (IRST) systems. Although deep learning has been widely used in the vision tasks of visible light images recently, it is rarely used in infrared small target detection due to the difficulty in learning small target features. In this paper, we propose a novel lightweight convolutional neural network TBC-Net for infrared small target detection. The TBCNet consists of a target extraction module (TEM) and a semantic constraint module (SCM), which are used to extract small targets from infrared images and to classify the extracted target images during the training, respectively. Meanwhile, we propose a joint loss function and a training method. The SCM imposes a semantic constraint on TEM by combining the high-level classification task and solve the problem of the difficulty to learn features caused by class imbalance problem. During the training, the targets are extracted from the input image and then be classified by SCM. During the inference, only the TEM is used to detect the small targets. We also propose a data synthesis method to generate training data. The experimental results show that compared with the traditional methods, TBC-Net can better reduce the false alarm caused by complicated background, the proposed network structure and joint loss have a significant improvement on small target feature learning. Besides, TBC-Net can achieve real-time detection on the NVIDIA Jetson AGX Xavier development board, which is suitable for applications such as field research with drones equipped with infrared sensors.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-09
Jacob J Decoto; David RC Dayton

Uncorrelated optical space observation association represents a classic needle in a haystack problem. The objective being to find small groups of observations that are likely of the same resident space objects (RSOs) from amongst the much larger population of all uncorrelated observations. These observations being potentially widely disparate both temporally and with respect to the observing sensor position. By training on a large representative data set this paper shows that a deep learning enabled learned model with no encoded knowledge of physics or orbital mechanics can learn a model for identifying observations of common objects. When presented with balanced input sets of 50% matching observation pairs the learned model was able to correctly identify if the observation pairs were of the same RSO 83.1% of the time. The resulting learned model is then used in conjunction with a search algorithm on an unbalanced demonstration set of 1,000 disparate simulated uncorrelated observations and is shown to be able to successfully identify true three observation sets representing 111 out of 142 objects in the population. With most objects being identified in multiple three observation triplets. This is accomplished while only exploring 0.06% of the search space of 1.66e8 possible unique triplet combinations.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-08
Ethem F. Can; Aysu Ezen-Can

The success stories from deep learning models increase every day spanning different tasks from image classification to natural language understanding. With the increasing popularity of these models, scientists spend more and more time finding the optimal parameters and best model architectures for their tasks. In this paper, we focus on the ingredient that feeds these machines: the data. We hypothesize that the data ordering affects how well a model performs. To that end, we conduct experiments on an image classification task using ImageNet dataset and show that some data orderings are better than others in terms of obtaining higher classification accuracies. Experimental results show that independent of model architecture, learning rate and batch size, ordering of the data significantly affects the outcome. We show these findings using different metrics: NDCG, accuracy @ 1 and accuracy @ 5. Our goal here is to show that not only parameters and model architectures but also the data ordering has a say in obtaining better results.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-12
Siming Bayer; Ute Spiske; Jie Luo; Tobias Geimer; William M. Wells III; Martin Ostermeier; Rebecca Fahrig; Arya Nabavi; Christoph Bert; Ilker Eyupoglo; Andreas Maier

For a wide range of clinical applications, such as adaptive treatment planning or intraoperative image update, feature-based deformable registration (FDR) approaches are widely employed because of their simplicity and low computational complexity. FDR algorithms estimate a dense displacement field by interpolating a sparse field, which is given by the established correspondence between selected features. In this paper, we consider the deformation field as a Gaussian Process (GP), whereas the selected features are regarded as prior information on the valid deformations. Using GP, we are able to estimate the both dense displacement field and a corresponding uncertainty map at once. Furthermore, we evaluated the performance of different hyperparameter settings for squared exponential kernels with synthetic, phantom and clinical data respectively. The quantitative comparison shows, GP-based interpolation has performance on par with state-of-the-art B-spline interpolation. The greatest clinical benefit of GP-based interpolation is that it gives a reliable estimate of the mathematical uncertainty of the calculated dense displacement map.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-12
Yiyan Chen; Li Tao; Xueting Wang; Toshihiko Yamasaki

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each frame is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video frames in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Shubham Agarwal; Raghav Goyal

This manuscript describes our approach for the Visual Dialog Challenge 2018. We use an ensemble of three discriminative models with different encoders and decoders for our final submission. Our best performing model on 'test-std' split achieves the NDCG score of 55.46 and the MRR value of 63.77, securing third position in the challenge.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Fanxu Meng; Hao Cheng; Ke Li; Zhixin Xu; Rongrong Ji; Xing Sun; Gaungming Lu

This paper proposes a new learning paradigm called filter grafting, which aims to improve the representation capability of Deep Neural Networks (DNNs). The motivation is that DNNs have unimportant (invalid) filters (e.g., l1 norm close to 0). These filters limit the potential of DNNs since they are identified as having little effect on the network. While filter pruning removes these invalid filters for efficiency consideration, filter grafting re-activates them from an accuracy boosting perspective. The activation is processed by grafting external information (weights) into invalid filters. To better perform the grafting process, we develop an entropy-based criterion to measure the information of filters and an adaptive weighting strategy for balancing the grafted information among networks. After the grafting operation, the network has very few invalid filters compared with its untouched state, enpowering the model with more representation capacity. We also perform extensive experiments on the classification and recognition tasks to show the superiority of our method. For example, the grafted MobileNetV2 outperforms the non-grafted MobileNetV2 by about 7 percent on CIFAR-100 dataset.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-14
Amir Erfan Eshratifar; Massoud Pedram

We propose a framework to design a light-weight neural multiplexer that given input and resource budgets, decides upon the appropriate model to be called for the inference. Mobile devices can use this framework to offload the hard inputs to the cloud while inferring the easy ones locally. Besides, in the large scale cloud-based intelligent applications, instead of replicating the most-accurate model, a range of small and large models can be multiplexed from depending on the input's complexity and resource budgets. Our experimental results demonstrate the effectiveness of our framework benefiting both mobile users and cloud providers.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-14
Vivian Lai; Han Liu; Chenhao Tan

To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials with guidelines from scientific papers, analogous to current practices of science communication, and automatically selected examples from training data with explanations. We use deceptive review detection as a testbed and conduct large-scale, randomized human-subject experiments to examine the effectiveness of such tutorials. We find that tutorials indeed improve human performance, with and without real-time assistance. In particular, although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans. Our work suggests future directions for human-centered tutorials and explanations towards a synergy between humans and AI.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Harshitha Machiraju; Vineeth N Balasubramanian

Small, carefully crafted perturbations called adversarial perturbations can easily fool neural networks. However, these perturbations are largely additive and not naturally found. We turn our attention to the field of Autonomous navigation wherein adverse weather conditions such as fog have a drastic effect on the predictions of these systems. These weather conditions are capable of acting like natural adversaries that can help in testing models. To this end, we introduce a general notion of adversarial perturbations, which can be created using generative models and provide a methodology inspired by Cycle-Consistent Generative Adversarial Networks to generate adversarial weather conditions for a given image. Our formulation and results show that these images provide a suitable testbed for steering models used in Autonomous navigation models. Our work also presents a more natural and general definition of Adversarial perturbations based on Perceptual Similarity.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Li Wang; Zechen Bai; Yonghua Zhang; Hongtao Lu

Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-15
Sourav Mishra; Subhajit Chaudhary; Hideaki Imaizumi; Toshihiko Yamasaki

This paper aims to evaluate the suitability of current deep learning methods for clinical workflow especially by focusing on dermatology. Although deep learning methods have been attempted to get dermatologist level accuracy in several individual conditions, it has not been rigorously tested for common clinical complaints. Most projects involve data acquired in well-controlled laboratory conditions. This may not reflect regular clinical evaluation where corresponding image quality is not always ideal. We test the robustness of deep learning methods by simulating non-ideal characteristics on user submitted images of ten classes of diseases. Assessing via imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training.

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Xiangxiang Chu; Xudong Li; Yi Lu; Bo Zhang; Jixiang Li

The expressiveness of search space is a key concern in neural architecture search (NAS). Previous block-level approaches mainly focus on searching networks that chain one operation after another. Incorporating multi-path search space with the one-shot doctrine remains untackled. In this paper, we investigate the supernet behavior under the multi-path setting, which we call MixPath. For a sampled training, simply switching multiple paths on and off incurs severe feature inconsistency which deteriorates the convergence. To remedy this effect, we employ what we term as \emph{shadow batch normalizations} (SBN) to follow various path patterns. Experiments performed on CIFAR-10 show that our approach is effective regardless of the number of allowable paths. Further experiments are conducted on ImageNet to have a fair comparison with the latest NAS methods. Our code will be available https://github.com/xiaomi-automl/MixPath.git .

更新日期：2020-01-17
• arXiv.cs.LG Pub Date : 2020-01-16
Divya Gautam; Maria Lomeli; Kostis Gourgoulias; Daniel H. Thompson; Saurabh Johri

We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser (arXiv:1711.00695) in order to learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$, where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ is some arbitrary subset of all random variables of the generative model of interest. In other words, we mimic the self-supervised training of a denoising autoencoder, where a dataset of unlabelled data is used as partially observed input and the neural approximator is optimised to minimise reconstruction loss. We focus on studying the underlying process of the partially observed data---how good is the neural approximator at learning all conditional distributions when the observation process at prediction time differs from the masking process during training? We compare networks trained with different masking schemes in terms of their predictive performance and generalisation properties.

更新日期：2020-01-17
Contents have been reproduced by permission of the publishers.

down
wechat
bug