当前期刊: "人工智能"类期刊
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Prognostic Factors of Rapid Symptoms Progression in Patients with Newly Diagnosed Parkinson’s Disease
    Artif. Intell. Med. (IF 3.574) Pub Date : 2020-01-21
    Kostas M. Tsiouris; Spiros Konitsiotis; Dimitrios D. Koutsouris; Dimitrios I. Fotiadis

    Tracking symptoms progression in the early stages of Parkinson’s disease (PD) is a laborious endeavor as the disease can be expressed with vastly different phenotypes, forcing clinicians to follow a multi-parametric approach in patient evaluation, looking for not only motor symptomatology but also non-motor complications, including cognitive decline, sleep problems and mood disturbances. Being neurodegenerative in nature, PD is expected to inflict a continuous degradation in patients’ condition over time. The rate of symptoms progression, however, is found to be even more chaotic than the vastly different phenotypes that can be expressed in the initial stages of PD. In this work, an analysis of baseline PD characteristics is performed using machine learning techniques, to identify prognostic factors for early rapid progression of PD symptoms. Using open data from the Parkinson’s Progression Markers Initiative (PPMI) study, an extensive set of baseline patient evaluation outcomes is examined to isolate determinants of rapid progression within the first two and four years of PD. The rate of symptoms progression is estimated by tracking the change of the Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS‐UPDRS) total score over the corresponding follow-up period. Patients are ranked according to their progression rates and those who expressed the highest rates of MDS-UPDRS total score increase per year of follow-up period are assigned into the rapid progression class, using 5- and 10-quantiles partition. Classification performance against the rapid progression class was evaluated in a per quantile partition analysis scheme and in quantile-independent approach, respectively. The results shown a more accurate patient discrimination with quantile partitioning, however, a much more compact subset of baseline factors is extracted in the latter, making a more suitable for actual interventions in practice. Classification accuracy improved in all cases when using the longer 4-year follow-up period to estimate PD progression, suggesting that a prolonged patient evaluation can provide better outcomes in identifying rapid progression phenotype. Non-motor symptoms are found to be the main determinants of rapid symptoms progression in both follow-up periods, with autonomic dysfunction, mood impairment, anxiety, REM sleep behavior disorders, cognitive decline and memory impairment being alarming signs at baseline evaluation, along with rigidity symptoms, certain laboratory blood test results and genetic mutations.

    更新日期:2020-01-22
  • Variational Autoencoder based Bipartite Network Embedding by Integrating Local and Global Structure
    Inform. Sci. (IF 5.524) Pub Date : 2020-01-21
    Pengfei Jiao; Minghu Tang; Hongtao Liu; Yaping Wang; Chunyu Lu; Huaming Wu

    As a powerful tool for machine learning on the graph, network embedding, which projects nodes into low-dimensional spaces, has a variety of applications on complex networks. Most current methods and models are not suitable for bipartite networks, which have two different types of nodes and there are no links between nodes of the same type. Furthermore, the only existing methods for bipartite network embedding ignore the internal mechanism and highly nonlinear structures of links. Therefore, in this paper, we propose a new deep learning method to learn the node embedding for bipartite networks based on the widely used autoencoder framework. Moreover, we carefully devise a node-level triplet including two types of nodes to assign the embedding by integrating the local and global structures. Meanwhile, we apply the variational autoencoder (VAE), a deep generation model with natural advantages in data generation and reconstruction, to enhance the node embedding for the highly nonlinear relationships between nodes and complex features. Experiments on some widely used datasets show the effectiveness of the proposed model and corresponding algorithm compared with some baseline network (and bipartite) embedding techniques.

    更新日期:2020-01-22
  • Auto-weighted Multi-view Co-clustering via Fast Matrix Factorization
    Pattern Recogn. (IF 5.898) Pub Date : 2020-01-21
    Feiping Nie; Shaojun Shi; Xuelong Li

    Multi-view clustering is a hot research topic in machine learning and pattern recognition, however, it remains high computational complexity when clustering multi-view data sets. Although a number of approaches have been proposed to accelerate the computational efficiency, most of them do not consider the data duality between features and samples. In this paper, we propose a novel co-clustering approach termed as Fast Multi-view Bilateral K-means (FMVBKM), which can implement clustering task on row and column of the input data matrix, simultaneously. Specifically, FMVBKM applies the relaxed K-means clustering technique to multi-view data clustering. In addition, to decrease information loss in matrix factorization, we further introduce a new co-clustering method named as Fast Multi-view Matrix Tri-Factorization (FMVMTF). Extensive experimental results on six benchmark data sets show that the proposed two approaches not only have comparable clustering performance but also present the high computational efficiency, in comparison with state-of-the-art multi-view clustering methods.

    更新日期:2020-01-22
  • PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning
    Pattern Recogn. (IF 5.898) Pub Date : 2020-01-21
    Guangyao Zhai; Liang Liu; Linjian Zhang; Yong Liu; Yunliang Jiang

    Visual ego-motion estimation is one of the longstanding problems which estimates the movement of cameras from images. Learning based ego-motion estimation methods have seen an increasing attention since its desirable properties of robustness to image noise and camera calibration independence. In this work, we propose a data-driven approach of learning based visual ego-motion estimation for a monocular camera. We use an end-to-end learning approach in allowing the model to learn a map from input image pairs to the corresponding ego-motion, which is parameterized as 6-DoF transformation matrix. We introduce a two-module Long-term Recurrent Convolutional Neural Networks called PoseConvGRU. The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs. The visual memory is implemented with convolutional gated recurrent units, which allows propagating information over time. At each time step, two consecutive RGB images are stacked together to form a 6-channel tensor for feature-encoding module to learn how to extract motion information and estimate poses. The sequence of output maps is then passed through the memory-propagating module to generate the relative transformation pose of each image pair. In addition, we have designed a series of data augmentation methods to avoid the overfitting problem and improve the performance of the model when facing challengeable scenarios such as high-speed or reverse driving. We evaluate the performance of our proposed approach on the KITTI Visual Odometry benchmark and Malaga 2013 Dataset. The experiments show a competitive performance of the proposed method to the state-of-the-art monocular geometric and learning methods and encourage further exploration of learning-based methods for the purpose of estimating camera ego-motion even though geometrical methods demonstrate promising results.

    更新日期:2020-01-22
  • Ethical approaches and autonomous systems
    Artif. Intell. (IF 4.483) Pub Date : 2020-01-21
    T.J.M. Bench-Capon

    In this paper we consider how the three main approaches to ethics – deontology, consequentialism and virtue ethics – relate to the implementation of ethical agents. We provide a description of each approach and how agents might be implemented by designers following the different approaches. Although there are numerous examples of agents implemented within the consequentialist and deontological approaches, this is not so for virtue ethics. We therefore propose a novel means of implementing agents within the virtue ethics approach. It is seen that each approach has its own particular strengths and weaknesses when considered as the basis for implementing ethical agents, and that the different approaches are appropriate to different kinds of system.

    更新日期:2020-01-22
  • Multi-scale affined-HOF and dimension selection for view-unconstrained action recognition
    Appl. Intell. (IF 2.882) Pub Date : 2020-01-22
    Dinh Tuan Tran, Hirotake Yamazoe, Joo-Ho Lee

    Abstract In this paper an action recognition method that can adaptively handle the problems of variations in camera viewpoint is introduced. Our contribution is three-fold. First, a space-sampling algorithm based on affine transform in multiple scales is proposed to yield a series of different viewpoints from a single one. A histogram of dense optical flow is then extracted over each fixed-size patch for a given generated viewpoint as a local feature descriptor. Second, a dimension selection procedure is also proposed to retain only the dimensions that have distinctive information and discard the unnecessary ones in the feature vector space. Third, to adapt to a situation in which video data in multiple viewpoints are used for training; an extended method with a voting algorithm is also introduced to increase the recognition accuracy. By conducting experiments using both simulated and realistic datasets (http://www.aislab.org/index.php/en/mvar-datasets), the proposed method is validated. The method is found to be accurate and capable of maintaining its accuracy under a wide range of viewpoint changes. In addition, the method is less sensitive to variations in subject scale, subject position, action speed, partial occlusion, and background. The method is also validated by comparing with state-of-the-art view-invariant action recognition methods using well-known i3DPost and MuHAVi public datasets.

    更新日期:2020-01-22
  • STDS: self-training data streams for mining limited labeled data in non-stationary environment
    Appl. Intell. (IF 2.882) Pub Date : 2020-01-21
    Shirin Khezri, Jafar Tanha, Ali Ahmadi, Arash Sharifi

    Abstract Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.

    更新日期:2020-01-22
  • Population-based metaheuristics for Association Rule Text Mining
    arXiv.cs.NE Pub Date : 2020-01-17
    Iztok Fister Jr.; Suash Deb; Iztok Fister

    Nowadays, the majority of data on the Internet is held in an unstructured format, like websites and e-mails. The importance of analyzing these data has been growing day by day. Similar to data mining on structured data, text mining methods for handling unstructured data have also received increasing attention from the research community. The paper deals with the problem of Association Rule Text Mining. To solve the problem, the PSO-ARTM method was proposed, that consists of three steps: Text preprocessing, Association Rule Text Mining using population-based metaheuristics, and text postprocessing. The method was applied to a transaction database obtained from professional triathlon athletes' blogs and news posted on their websites. The obtained results reveal that the proposed method is suitable for Association Rule Text Mining and, therefore, offers a promising way for further development.

    更新日期:2020-01-22
  • Multi-factorial Optimization for Large-scale Virtual Machine Placement in Cloud Computing
    arXiv.cs.NE Pub Date : 2020-01-18
    Zhengping Liang; Jian Zhang; Liang Feng; Zexuan Zhu

    The placement scheme of virtual machines (VMs) to physical servers (PSs) is crucial to lowering operational cost for cloud providers. Evolutionary algorithms (EAs) have been performed promising-solving on virtual machine placement (VMP) problems in the past. However, as growing demand for cloud services, the existing EAs fail to implement in large-scale virtual machine placement (LVMP) problem due to the high time complexity and poor scalability. Recently, the multi-factorial optimization (MFO) technology has surfaced as a new search paradigm in evolutionary computing. It offers the ability to evolve multiple optimization tasks simultaneously during the evolutionary process. This paper aims to apply the MFO technology to the LVMP problem in heterogeneous environment. Firstly, we formulate a deployment cost based VMP problem in the form of the MFO problem. Then, a multi-factorial evolutionary algorithm (MFEA) embedded with greedy-based allocation operator is developed to address the established MFO problem. After that, a re-migration and merge operator is designed to offer the integrated solution of the LVMP problem from the solutions of MFO problem. To assess the effectiveness of our proposed method, the simulation experiments are carried on large-scale and extra large-scale VMs test data sets. The results show that compared with various heuristic methods, our method could shorten optimization time significantly and offer a competitive placement solution for the LVMP problem in heterogeneous environment.

    更新日期:2020-01-22
  • Memory capacity of neural networks with threshold and ReLU activations
    arXiv.cs.NE Pub Date : 2020-01-20
    Roman Vershynin

    Overwhelming theoretical and empirical evidence shows that mildly overparametrized neural networks -- those with more connections than the size of the training data -- are often able to memorize the training data with $100\%$ accuracy. This was rigorously proved for networks with sigmoid activation functions and, very recently, for ReLU activations. Addressing a 1988 open question of Baum, we prove that this phenomenon holds for general multilayered perceptrons, i.e. neural networks with threshold activation functions, or with any mix of threshold and ReLU activations. Our construction is probabilistic and exploits sparsity.

    更新日期:2020-01-22
  • MOEA/D with Random Partial Update Strategy
    arXiv.cs.NE Pub Date : 2020-01-20
    Yuri Lavinas; Claus Aranha; Marcelo Ladeira; Felipe Campelo

    Recent studies on resource allocation suggest that some subproblems are more important than others in the context of the MOEA/D, and that focusing on the most relevant ones can consistently improve the performance of that algorithm. These studies share the common characteristic of updating only a fraction of the population at any given iteration of the algorithm. In this work we investigate a new, simpler partial update strategy, in which a random subset of solutions is selected at every iteration. The performance of the MOEA/D using this new resource allocation approach is compared experimentally against that of the standard MOEA/D-DE and the MOEA/D with relative improvement-based resource allocation. The results indicate that using the MOEA/D with this new partial update strategy results in improved HV and IGD values, and a much higher proportion of non-dominated solutions, particularly as the number of updated solutions at every iteration is reduced.

    更新日期:2020-01-22
  • An Efficient Framework for Automated Screening of Clinically Significant Macular Edema
    arXiv.cs.NE Pub Date : 2020-01-20
    Renoh Johnson Chalakkal; Faizal Hafiz; Waleed Abdulla; Akshya Swain

    The present study proposes a new approach to automated screening of Clinically Significant Macular Edema (CSME) and addresses two major challenges associated with such screenings, i.e., exudate segmentation and imbalanced datasets. The proposed approach replaces the conventional exudate segmentation based feature extraction by combining a pre-trained deep neural network with meta-heuristic feature selection. A feature space over-sampling technique is being used to overcome the effects of skewed datasets and the screening is accomplished by a k-NN based classifier. The role of each data-processing step (e.g., class balancing, feature selection) and the effects of limiting the region-of-interest to fovea on the classification performance are critically analyzed. Finally, the selection and implication of operating point on Receiver Operating Characteristic curve are discussed. The results of this study convincingly demonstrate that by following these fundamental practices of machine learning, a basic k-NN based classifier could effectively accomplish the CSME screening.

    更新日期:2020-01-22
  • DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm
    arXiv.cs.NE Pub Date : 2020-01-21
    Hao Xu; Haibin Chang; Dongxiao Zhang

    Data-driven methods have recently been developed to discover underlying partial differential equations (PDEs) of physical problems. However, for these methods, a complete candidate library of potential terms in a PDE are usually required. To overcome this limitation, we propose a novel framework combining deep learning and genetic algorithm, called DLGA-PDE, for discovering PDEs. In the proposed framework, a deep neural network that is trained with available data of a physical problem is utilized to generate meta-data and calculate derivatives, and the genetic algorithm is then employed to discover the underlying PDE. Owing to the merits of the genetic algorithm, such as mutation and crossover, DLGA-PDE can work with an incomplete candidate library. The proposed DLGA-PDE is tested for discovery of the Korteweg-de Vries (KdV) equation, the Burgers equation, the wave equation, and the Chaffee-Infante equation, respectively, for proof-of-concept. Satisfactory results are obtained without the need for a complete candidate library, even in the presence of noisy and limited data.

    更新日期:2020-01-22
  • Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
    arXiv.cs.NE Pub Date : 2018-05-21
    Liang Luo; Jacob Nelson; Luis Ceze; Amar Phanishayee; Arvind Krishnamurthy

    Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter servers (PSs) with optimized network stacks and gradient processing pipelines, as well as server and network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art distributed training techniques for cloud-based ImageNet workloads, with 25% better throughput per dollar.

    更新日期:2020-01-22
  • A Tight Runtime Analysis for the $(μ+ λ)$ EA
    arXiv.cs.NE Pub Date : 2018-12-28
    Denis Antipov; Benjamin Doerr

    Despite significant progress in the theory of evolutionary algorithms, the theoretical understanding of evolutionary algorithms which use non-trivial populations remains challenging and only few rigorous results exist. Already for the most basic problem, the determination of the asymptotic runtime of the $(\mu+\lambda)$ evolutionary algorithm on the simple OneMax benchmark function, only the special cases $\mu=1$ and $\lambda=1$ have been solved. In this work, we analyze this long-standing problem and show the asymptotically tight result that the runtime $T$, the number of iterations until the optimum is found, satisfies \[E[T] = \Theta\bigg(\frac{n\log n}{\lambda}+\frac{n}{\lambda / \mu} + \frac{n\log^+\log^+ \lambda/ \mu}{\log^+ \lambda / \mu}\bigg),\] where $\log^+ x := \max\{1, \log x\}$ for all $x > 0$. The same methods allow to improve the previous-best $O(\frac{n \log n}{\lambda} + n \log \lambda)$ runtime guarantee for the $(\lambda+\lambda)$~EA with fair parent selection to a tight $\Theta(\frac{n \log n}{\lambda} + n)$ runtime result.

    更新日期:2020-01-22
  • Dissecting Deep Neural Networks
    arXiv.cs.NE Pub Date : 2019-10-09
    Haakon Robinson; Adil Rasheed; Omer San

    In exchange for large quantities of data and processing power, deep neural networks have yielded models that provide state of the art predication capabilities in many fields. However, a lack of strong guarantees on their behaviour have raised concerns over their use in safety-critical applications. A first step to understanding these networks is to develop alternate representations that allow for further analysis. It has been shown that neural networks with piecewise affine activation functions are themselves piecewise affine, with their domains consisting of a vast number of linear regions. So far, the research on this topic has focused on counting the number of linear regions, rather than obtaining explicit piecewise affine representations. This work presents a novel algorithm that can compute the piecewise affine form of any fully connected neural network with rectified linear unit activations.

    更新日期:2020-01-22
  • Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
    arXiv.cs.NE Pub Date : 2019-10-24
    Dongxu Li; Cristian Rodriguez Opazo; Xin Yu; Hongdong Li

    Vision-based sign language recognition aims at helping deaf people to communicate with others. However, most existing sign language datasets are limited to a small number of words. Due to the limited vocabulary size, models learned from those datasets cannot be applied in practice. In this paper, we introduce a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers. This dataset will be made publicly available to the research community. To our knowledge, it is by far the largest public ASL dataset to facilitate word-level sign recognition research. Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios. Specifically we implement and compare two different models,i.e., (i) holistic visual appearance-based approach, and (ii) 2D human pose based approach. Both models are valuable baselines that will benefit the community for method benchmarking. Moreover, we also propose a novel pose-based temporal graph convolution networks (Pose-TGCN) that models spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose-based method. Our results show that pose-based and appearance-based models achieve comparable performances up to 66% at top-10 accuracy on 2,000 words/glosses, demonstrating the validity and challenges of our dataset. Our dataset and baseline deep models are available at \url{https://dxli94.github.io/WLASL/}.

    更新日期:2020-01-22
  • K-NN active learning under local smoothness assumption
    arXiv.cs.LG Pub Date : 2020-01-17
    Boris Ndjia Njike; Xavier Siebert

    There is a large body of work on convergence rates either in passive or active learning. Here we first outline some of the main results that have been obtained, more specifically in a nonparametric setting under assumptions about the smoothness of the regression function (or the boundary between classes) and the margin noise. We discuss the relative merits of these underlying assumptions by putting active learning in perspective with recent work on passive learning. We design an active learning algorithm with a rate of convergence better than in passive learning, using a particular smoothness assumption customized for k-nearest neighbors. Unlike previous active learning algorithms, we use a smoothness assumption that provides a dependence on the marginal distribution of the instance space. Additionally, our algorithm avoids the strong density assumption that supposes the existence of the density function of the marginal distribution of the instance space and is therefore more generally applicable.

    更新日期:2020-01-22
  • Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory
    arXiv.cs.LG Pub Date : 2020-01-17
    Yunlong Lu; Kai Yan

    Deep reinforcement learning (RL) has achieved outstanding results in recent years, which has led a dramatic increase in the number of methods and applications. Recent works are exploring learning beyond single-agent scenarios and considering multi-agent scenarios. However, they are faced with lots of challenges and are seeking for help from traditional game-theoretic algorithms, which, in turn, show bright application promise combined with modern algorithms and boosting computing power. In this survey, we first introduce basic concepts and algorithms in single agent RL and multi-agent systems; then, we summarize the related algorithms from three aspects. Solution concepts from game theory give inspiration to algorithms which try to evaluate the agents or find better solutions in multi-agent systems. Fictitious self-play becomes popular and has a great impact on the algorithm of multi-agent reinforcement learning. Counterfactual regret minimization is an important tool to solve games with incomplete information, and has shown great strength when combined with deep learning.

    更新日期:2020-01-22
  • Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems
    arXiv.cs.LG Pub Date : 2020-01-17
    Jaimie Drozdal; Justin Weisz; Dakuo Wang; Gaurav Dass; Bingsheng Yao; Changruo Zhao; Michael Muller; Lin Ju; Hui Su

    We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies -- qualitative interviews, a controlled experiment, and a card-sorting task -- to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.

    更新日期:2020-01-22
  • Neighborhood Structure Assisted Non-negative Matrix Factorization and its Application in Unsupervised Point Anomaly Detection
    arXiv.cs.LG Pub Date : 2020-01-17
    Imtiaz Ahmed; Xia Ben Hu; Mithun P. Acharya; Yu Ding

    Dimensionality reduction is considered as an important step for ensuring competitive performance in unsupervised learning such as anomaly detection. Non-negative matrix factorization (NMF) is a popular and widely used method to accomplish this goal. But NMF, together with its recent, enhanced version, like graph regularized NMF or symmetric NMF, do not have the provision to include the neighborhood structure information and, as a result, may fail to provide satisfactory performance in presence of nonlinear manifold structure. To address that shortcoming, we propose to consider and incorporate the neighborhood structural similarity information within the NMF framework by modeling the data through a minimum spanning tree. What motivates our choice is the understanding that in the presence of complicated data structure, a minimum spanning tree can approximate the intrinsic distance between two data points better than a simple Euclidean distance does, and consequently, it constitutes a more reasonable basis for differentiating anomalies from the normal class data. We label the resulting method as the neighborhood structure assisted NMF. By comparing the formulation and properties of the neighborhood structure assisted NMF with other versions of NMF including graph regularized NMF and symmetric NMF, it is apparent that the inclusion of the neighborhood structure information using minimum spanning tree makes a key difference. We further devise both offline and online algorithmic versions of the proposed method. Empirical comparisons using twenty benchmark datasets as well as an industrial dataset extracted from a hydropower plant demonstrate the superiority of the neighborhood structure assisted NMF and support our claim of merit.

    更新日期:2020-01-22
  • Siamese Graph Neural Networks for Data Integration
    arXiv.cs.LG Pub Date : 2020-01-17
    Evgeny Krivosheev; Mattia Atzeni; Katsiaryna Mirylenka; Paolo Scotton; Fabio Casati

    Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent development in machine learning and in particular deep learning has opened the way to more general and more efficient solutions to data integration problems. In this work, we propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles. Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible. This is achieved by combining siamese and graph neural networks to propagate information between connected entities and support high scalability. We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations.

    更新日期:2020-01-22
  • Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition
    arXiv.cs.LG Pub Date : 2020-01-17
    Sebastian Raschka; Benjamin Kaufman

    In the last decade, machine learning and artificial intelligence applications have received a significant boost in performance and attention in both academic research and industry. The success behind most of the recent state-of-the-art methods can be attributed to the latest developments in deep learning. When applied to various scientific domains that are concerned with the processing of non-tabular data, for example, image or text, deep learning has been shown to outperform not only conventional machine learning but also highly specialized tools developed by domain experts. This review aims to summarize AI-based research for GPCR bioactive ligand discovery with a particular focus on the most recent achievements and research trends. To make this article accessible to a broad audience of computational scientists, we provide instructive explanations of the underlying methodology, including overviews of the most commonly used deep learning architectures and feature representations of molecular data. We highlight the latest AI-based research that has led to the successful discovery of GPCR bioactive ligands. However, an equal focus of this review is on the discussion of machine learning-based technology that has been applied to ligand discovery in general and has the potential to pave the way for successful GPCR bioactive ligand discovery in the future. This review concludes with a brief outlook highlighting the recent research trends in deep learning, such as active learning and semi-supervised learning, which have great potential for advancing bioactive ligand discovery.

    更新日期:2020-01-22
  • Privacy Amplification of Iterative Algorithms via Contraction Coefficients
    arXiv.cs.LG Pub Date : 2020-01-17
    Shahab Asoodeh; Mario Diaz; Flavio P. Calmon

    We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for $f$-divergences. In particular, by generalizing the Dobrushin's contraction coefficient for total variation distance to an $f$-divergence known as $E_{\gamma}$-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates.

    更新日期:2020-01-22
  • Harmonic Convolutional Networks based on Discrete Cosine Transform
    arXiv.cs.LG Pub Date : 2020-01-18
    Matej Ulicny; Vladimir A. Krylov; Rozenn Dahyot

    Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. In this paper we propose to revert to learning combinations of preset spectral filters by switching to CNNs with harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT) filters which have excellent energy compaction properties and are widely used for image compression. The proposed harmonic blocks rely on DCT-modeling and replace conventional convolutional layers to produce partially or fully harmonic versions of new or existing CNN architectures. We demonstrate how the harmonic networks can be efficiently compressed in a straightforward manner by truncating high-frequency information in harmonic blocks which is possible due to the redundancies in the spectral domain. We report extensive experimental validation demonstrating the benefits of the introduction of harmonic blocks into state-of-the-art CNN models in image classification, segmentation and edge detection applications.

    更新日期:2020-01-22
  • Inference for Network Structure and Dynamics from Time Series Data via Graph Neural Network
    arXiv.cs.LG Pub Date : 2020-01-18
    Mengyuan Chen; Jiang Zhang; Zhang Zhang; Lun Du; Qiao Hu; Shuo Wang; Jiaqi Zhu

    Network structures in various backgrounds play important roles in social, technological, and biological systems. However, the observable network structures in real cases are often incomplete or unavailable due to measurement errors or private protection issues. Therefore, inferring the complete network structure is useful for understanding complex systems. The existing studies have not fully solved the problem of inferring network structure with partial or no information about connections or nodes. In this paper, we tackle the problem by utilizing time series data generated by network dynamics. We regard the network inference problem based on dynamical time series data as a problem of minimizing errors for predicting future states and proposed a novel data-driven deep learning model called Gumbel Graph Network (GGN) to solve the two kinds of network inference problems: Network Reconstruction and Network Completion. For the network reconstruction problem, the GGN framework includes two modules: the dynamics learner and the network generator. For the network completion problem, GGN adds a new module called the States Learner to infer missing parts of the network. We carried out experiments on discrete and continuous time series data. The experiments show that our method can reconstruct up to 100% network structure on the network reconstruction task. While the model can also infer the unknown parts of the structure with up to 90% accuracy when some nodes are missing. And the accuracy decays with the increase of the fractions of missing nodes. Our framework may have wide application areas where the network structure is hard to obtained and the time series data is rich.

    更新日期:2020-01-22
  • Scalable Bid Landscape Forecasting in Real-time Bidding
    arXiv.cs.LG Pub Date : 2020-01-18
    Aritra Ghosh; Saayan Mitra; Somdeb Sarkhel; Jason Xie; Gang Wu; Viswanathan Swaminathan

    In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated. We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network. Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of the largest demand-side platforms and significant improvement has been achieved in comparison with the baseline solutions.

    更新日期:2020-01-22
  • FlexiBO: Cost-Aware Multi-Objective Optimization of Deep Neural Networks
    arXiv.cs.LG Pub Date : 2020-01-18
    Md Shahriar Iqbal; Jianhai Su; Lars Kotthoff; Pooyan Jamshidi

    One of the key challenges in designing machine learning systems is to determine the right balance amongst several objectives, which also oftentimes are incommensurable and conflicting. For example, when designing deep neural networks (DNNs), one often has to trade-off between multiple objectives, such as accuracy, energy consumption, and inference time. Typically, there is no single configuration that performs equally well for all objectives. Consequently, one is interested in identifying Pareto-optimal designs. Although different multi-objective optimization algorithms have been developed to identify Pareto-optimal configurations, state-of-the-art multi-objective optimization methods do not consider the different evaluation costs attending the objectives under consideration. This is particularly important for optimizing DNNs: the cost arising on account of assessing the accuracy of DNNs is orders of magnitude higher than that of measuring the energy consumption of pre-trained DNNs. We propose FlexiBO, a flexible Bayesian optimization method, to address this issue. We formulate a new acquisition function based on the improvement of the Pareto hyper-volume weighted by the measurement cost of each objective. Our acquisition function selects the next sample and objective that provides maximum information gain per unit of cost. We evaluated FlexiBO on 7 state-of-the-art DNNs for object detection, natural language processing, and speech recognition. Our results indicate that, when compared to other state-of-the-art methods across the 7 architectures we tested, the Pareto front obtained using FlexiBO has, on average, a 28.44% higher contribution to the true Pareto front and achieves 25.64% better diversity.

    更新日期:2020-01-22
  • Regularized Cycle Consistent Generative Adversarial Network for Anomaly Detection
    arXiv.cs.LG Pub Date : 2020-01-18
    Ziyi Yang; Iman Soltani Bozchalooi; Eric Darve

    In this paper, we investigate algorithms for anomaly detection. Previous anomaly detection methods focus on modeling the distribution of non-anomalous data provided during training. However, this does not necessarily ensure the correct detection of anomalous data. We propose a new Regularized Cycle Consistent Generative Adversarial Network (RCGAN) in which deep neural networks are adversarially trained to better recognize anomalous samples. This approach is based on leveraging a penalty distribution with a new definition of the loss function and novel use of discriminator networks. It is based on a solid mathematical foundation, and proofs show that our approach has stronger guarantees for detecting anomalous examples compared to the current state-of-the-art. Experimental results on both real-world and synthetic data show that our model leads to significant and consistent improvements on previous anomaly detection benchmarks. Notably, RCGAN improves on the state-of-the-art on the KDDCUP, Arrhythmia, Thyroid, Musk and CIFAR10 datasets.

    更新日期:2020-01-22
  • Machine Learning in Quantitative PET Imaging
    arXiv.cs.LG Pub Date : 2020-01-18
    Tonghe Wang; Yang Lei; Yabo Fu; Walter J. Curran; Tian Liu; Xiaofeng Yang

    This paper reviewed the machine learning-based studies for quantitative positron emission tomography (PET). Specifically, we summarized the recent developments of machine learning-based methods in PET attenuation correction and low-count PET reconstruction by listing and comparing the proposed methods, study designs and reported performances of the current published studies with brief discussion on representative studies. The contributions and challenges among the reviewed studies were summarized and highlighted in the discussion part followed by.

    更新日期:2020-01-22
  • Adaptive Parameterization for Neural Dialogue Generation
    arXiv.cs.LG Pub Date : 2020-01-18
    Hengyi Cai; Hongshen Chen; Cheng Zhang; Yonghao Song; Xiaofang Zhao; Dawei Yin

    Neural conversation systems generate responses based on the sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with a single set of learned parameters to generate responses for given input contexts. When confronting diverse conversations, its adaptability is rather limited and the model is hence prone to generate generic responses. In this work, we propose an {\bf Ada}ptive {\bf N}eural {\bf D}ialogue generation model, \textsc{AdaND}, which manages various conversations with conversation-specific parameterization. For each conversation, the model generates parameters of the encoder-decoder by referring to the input context. In particular, we propose two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism. The context-aware parameterization directly generates the parameters by capturing local semantics of the given context. The topic-aware parameterization enables parameter sharing among conversations with similar topics by first inferring the latent topics of the given context and then generating the parameters with respect to the distributional topics. Extensive experiments conducted on a large-scale real-world conversational dataset show that our model achieves superior performance in terms of both quantitative metrics and human evaluations.

    更新日期:2020-01-22
  • Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning
    arXiv.cs.LG Pub Date : 2020-01-18
    Samaneh Hosseini Semnani; Hugh Liu; Michael Everett; Anton de Ruiter; Jonathan P. How

    This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50% more successful scenarios than deep RL and up to 75% less extra time to reach goal than FMP.

    更新日期:2020-01-22
  • Graph Ordering: Towards the Optimal by Learning
    arXiv.cs.LG Pub Date : 2020-01-18
    Kangfei Zhao; Yu Rong; Jeffrey Xu Yu; Junzhou Huang; Hao Zhang

    Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, link prediction, and community detection. These models are usually designed to preserve the vertex information at different granularity and reduce the problems in discrete space to some machine learning tasks in continuous space. However, regardless of the fruitful progress, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. Moreover, these problems are closely related to reformulating a global layout for a specific graph, which is an important NP-hard combinatorial optimization problem: graph ordering. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach. Distinguished from greedy algorithms based on predefined heuristics, we propose a neural network model: Deep Order Network (DON) to capture the hidden locality structure from partial vertex order sets. Supervised by sampled partial order, DON has the ability to infer unseen combinations. Furthermore, to alleviate the combinatorial explosion in the training space of DON and make the efficient partial vertex order sampling , we employ a reinforcement learning model: the Policy Network, to adjust the partial order sampling probabilities during the training phase of DON automatically. To this end, the Policy Network can improve the training efficiency and guide DON to evolve towards a more effective model automatically. Comprehensive experiments on both synthetic and real data validate that DON-RL outperforms the current state-of-the-art heuristic algorithm consistently. Two case studies on graph compression and edge partitioning demonstrate the potential power of DON-RL in real applications.

    更新日期:2020-01-22
  • OIAD: One-for-all Image Anomaly Detection with Disentanglement Learning
    arXiv.cs.LG Pub Date : 2020-01-18
    Shuo Wang; Tianle Chen; Shangyu Chen; Carsten Rudolph; Surya Nepal; Marthie Grobler

    Anomaly detection aims to recognize samples with anomalous and unusual patterns with respect to a set of normal data, which is significant for numerous domain applications, e.g. in industrial inspection, medical imaging, and security enforcement. There are two key research challenges associated with existing anomaly detention approaches: (1) many of them perform well on low-dimensional problems however the performance on high-dimensional instances is limited, such as images; (2) many of them depend on often still rely on traditional supervised approaches and manual engineering of features, while the topic has not been fully explored yet using modern deep learning approaches, even when the well-label samples are limited. In this paper, we propose a One-for-all Image Anomaly Detection system (OIAD) based on disentangled learning using only clean samples. Our key insight is that the impact of small perturbation on the latent representation can be bounded for normal samples while anomaly images are usually outside such bounded intervals, called structure consistency. We implement this idea and evaluate its performance for anomaly detention. Our experiments with three datasets show that OIAD can detect over $90\%$ of anomalies while maintaining a high low false alarm rate. It can also detect suspicious samples from samples labeled as clean, coincided with what humans would deem unusual.

    更新日期:2020-01-22
  • Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval
    arXiv.cs.LG Pub Date : 2020-01-18
    Anubha Pandey; Ashish Mishra; Vinay Kumar Verma; Anurag Mittal; Hema A. Murthy

    Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.

    更新日期:2020-01-22
  • Deep Collaborative Embedding for information cascade prediction
    arXiv.cs.LG Pub Date : 2020-01-18
    Yuhui Zhao; Ning Yang; Tao Lin; Philip S. Yu

    Recently, information cascade prediction has attracted increasing interest from researchers, but it is far from being well solved partly due to the three defects of the existing works. First, the existing works often assume an underlying information diffusion model, which is impractical in real world due to the complexity of information diffusion. Second, the existing works often ignore the prediction of the infection order, which also plays an important role in social network analysis. At last, the existing works often depend on the requirement of underlying diffusion networks which are likely unobservable in practice. In this paper, we aim at the prediction of both node infection and infection order without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network, where the challenges are two-fold. The first is what cascading characteristics of nodes should be captured and how to capture them, and the second is that how to model the non-linear features of nodes in information cascades. To address these challenges, we propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction, which can capture not only the node structural property but also two kinds of node cascading characteristics. We propose an auto-encoder based collaborative embedding framework to learn the node embeddings with cascade collaboration and node collaboration, in which way the non-linearity of information cascades can be effectively captured. The results of extensive experiments conducted on real-world datasets verify the effectiveness of our approach.

    更新日期:2020-01-22
  • Learning to See Analogies: A Connectionist Exploration
    arXiv.cs.LG Pub Date : 2020-01-18
    Douglas S. Blank

    This dissertation explores the integration of learning and analogy-making through the development of a computer program, called Analogator, that learns to make analogies by example. By "seeing" many different analogy problems, along with possible solutions, Analogator gradually develops an ability to make new analogies. That is, it learns to make analogies by analogy. This approach stands in contrast to most existing research on analogy-making, in which typically the a priori existence of analogical mechanisms within a model is assumed. The present research extends standard connectionist methodologies by developing a specialized associative training procedure for a recurrent network architecture. The network is trained to divide input scenes (or situations) into appropriate figure and ground components. Seeing one scene in terms of a particular figure and ground provides the context for seeing another in an analogous fashion. After training, the model is able to make new analogies between novel situations. Analogator has much in common with lower-level perceptual models of categorization and recognition; it thus serves as a unifying framework encompassing both high-level analogical learning and low-level perception. This approach is compared and contrasted with other computational models of analogy-making. The model's training and generalization performance is examined, and limitations are discussed.

    更新日期:2020-01-22
  • Efficient Neural Architecture Search: A Broad Version
    arXiv.cs.LG Pub Date : 2020-01-18
    Zixiang Ding; Yaran Chen; Nannan Li; Dongbin Zhao; C. L. Philip Chen

    Efficient Neural Architecture Search (ENAS) achieves novel efficiency for learning architecture with high-performance via parameter sharing, but suffers from an issue of slow propagation speed of search model with deep topology. In this paper, we propose a Broad version for ENAS (BENAS) to solve the above issue, by learning broad architecture whose propagation speed is fast with reinforcement learning and parameter sharing used in ENAS, thereby achieving a higher search efficiency. In particular, we elaborately design Broad Convolutional Neural Network (BCNN), the search paradigm of BENAS with fast propagation speed, which can obtain a satisfactory performance with broad topology, i.e. fast forward and backward propagation speed. The proposed BCNN extracts multi-scale features and enhancement representations, and feeds them into global average pooling layer to yield more reasonable and comprehensive representations so that the achieved performance of BCNN with shallow topology can be promised. In order to verify the effectiveness of BENAS, several experiments are performed and experimental results show that 1) BENAS delivers 0.23 day which is 2x less expensive than ENAS, 2) the architecture learned by BENAS based small-size BCNNs with 0.5 and 1.1 millions parameters obtain state-of-the-art performance, 3.63% and 3.40% test error on CIFAR-10, 3) the learned architecture based BCNN achieves 25.3\% top-1 error on ImageNet just using 3.9 millions parameters.

    更新日期:2020-01-22
  • How do Data Science Workers Collaborate? Roles, Workflows, and Tools
    arXiv.cs.LG Pub Date : 2020-01-18
    Amy X. Zhang; Michael Muller; Dakuo Wang

    Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.

    更新日期:2020-01-22
  • Teaching Software Engineering for AI-Enabled Systems
    arXiv.cs.LG Pub Date : 2020-01-18
    Christian Kästner; Eunsuk Kang

    Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that are scalable, responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach software-engineering skills to students with a background in ML. We specifically go beyond traditional ML courses that teach modeling techniques under artificial conditions and focus, in lecture and assignments, on realism with large and changing datasets, robust and evolvable infrastructure, and purposeful requirements engineering that considers ethics and fairness as well. We describe the course and our infrastructure and share experience and all material from teaching the course for the first time.

    更新日期:2020-01-22
  • Fair Transfer of Multiple Style Attributes in Text
    arXiv.cs.LG Pub Date : 2020-01-18
    Karan Dabas; Nishtha Madan; Vijay Arya; Sameep Mehta; Gautam Singh; Tanmoy Chakraborty

    To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. In this work, we demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp data set to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.

    更新日期:2020-01-22
  • Adaptive Stochastic Optimization
    arXiv.cs.LG Pub Date : 2020-01-18
    Frank E. Curtis; Katya Scheinberg

    Optimization lies at the heart of machine learning and signal processing. Contemporary approaches based on the stochastic gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application. This article summarizes recent research and motivates future work on adaptive stochastic optimization methods, which have the potential to offer significant computational savings when training large-scale systems.

    更新日期:2020-01-22
  • A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints
    arXiv.cs.LG Pub Date : 2020-01-18
    Marek Śmieja; Łukasz Struski; Mário A. T. Figueiredo

    In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method.

    更新日期:2020-01-22
  • Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods
    arXiv.cs.LG Pub Date : 2020-01-18
    Juan Vargas; Lazar Andjelic; Amir Barati Farimani

    Actor critic methods with sparse rewards in model-based deep reinforcement learning typically require a deterministic binary reward function that reflects only two possible outcomes: if, for each step, the goal has been achieved or not. Our hypothesis is that we can influence an agent to learn faster by applying an external environmental pressure during training, which adversely impacts its ability to get higher rewards. As such, we deviate from the classical paradigm of sparse rewards and add a uniformly sampled reward value to the baseline reward to show that (1) sample efficiency of the training process can be correlated to the adversity experienced during training, (2) it is possible to achieve higher performance in less time and with less resources, (3) we can reduce the performance variability experienced seed over seed, (4) there is a maximum point after which more pressure will not generate better results, and (5) that random positive incentives have an adverse effect when using a negative reward strategy, making an agent under those conditions learn poorly and more slowly. These results have been shown to be valid for Deep Deterministic Policy Gradients using Hindsight Experience Replay in a well known Mujoco environment, but we argue that they could be generalized to other methods and environments as well.

    更新日期:2020-01-22
  • Big-Data Science in Porous Materials: Materials Genomics and Machine Learning
    arXiv.cs.LG Pub Date : 2020-01-18
    Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit

    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.

    更新日期:2020-01-22
  • Dual Stochastic Natural Gradient Descent
    arXiv.cs.LG Pub Date : 2020-01-19
    Borja Sánchez-López; Jesús Cerquides

    Although theoretically appealing, Stochastic Natural Gradient Descent (SNGD) is computationally expensive, it has been shown to be highly sensitive to the learning rate, and it is not guaranteed to be convergent. Convergent Stochastic Natural Gradient Descent (CSNGD) aims at solving the last two problems. However, the computational expense of CSNGD is still unacceptable when the number of parameters is large. In this paper we introduce the Dual Stochastic Natural Gradient Descent (DSNGD) where we take benefit of dually flat manifolds to obtain a robust alternative to SNGD which is also computationally feasible.

    更新日期:2020-01-22
  • Algebraic and Analytic Approaches for Parameter Learning in Mixture Models
    arXiv.cs.LG Pub Date : 2020-01-19
    Akshay Krishnamurthy; Arya Mazumdar; Andrew McGregor; Soumyabrata Pal

    We present two different approaches for parameter learning in several mixture models in one dimension. Our first approach uses complex-analytic methods and applies to Gaussian mixtures with shared variance, binomial mixtures with shared success probability, and Poisson mixtures, among others. An example result is that $\exp(O(N^{1/3}))$ samples suffice to exactly learn a mixture of $k

    更新日期:2020-01-22
  • Gradient Surgery for Multi-Task Learning
    arXiv.cs.LG Pub Date : 2020-01-19
    Tianhe Yu; Saurabh Kumar; Abhishek Gupta; Sergey Levine; Karol Hausman; Chelsea Finn

    While deep learning and deep reinforcement learning (RL) systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. In this work, we identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach for avoiding such interference between task gradients. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task RL problems, this approach leads to substantial gains in efficiency and performance. Further, it is model-agnostic and can be combined with previously-proposed multi-task architectures for enhanced performance.

    更新日期:2020-01-22
  • Learning Options from Demonstration using Skill Segmentation
    arXiv.cs.LG Pub Date : 2020-01-19
    Matthew Cockcroft; Shahil Mawjee; Steven James; Pravesh Ranchod

    We present a method for learning options from segmented demonstration trajectories. The trajectories are first segmented into skills using nonparametric Bayesian clustering and a reward function for each segment is then learned using inverse reinforcement learning. From this, a set of inferred trajectories for the demonstration are generated. Option initiation sets and termination conditions are learned from these trajectories using the one-class support vector machine clustering algorithm. We demonstrate our method in the four rooms domain, where an agent is able to autonomously discover usable options from human demonstration. Our results show that these inferred options can then be used to improve learning and planning.

    更新日期:2020-01-22
  • Discriminator Soft Actor Critic without Extrinsic Rewards
    arXiv.cs.LG Pub Date : 2020-01-19
    Daichi Nishio; Daiki Kuyoshi; Toi Tsuneda; Satoshi Yamane

    It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, We propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.

    更新日期:2020-01-22
  • Distributionally Robust Bayesian Quadrature Optimization
    arXiv.cs.LG Pub Date : 2020-01-19
    Thanh Tang Nguyen; Sunil Gupta; Huong Ha; Santu Rana; Svetha Venkatesh

    Bayesian quadrature optimization (BQO) maximizes the expectation of an expensive black-box integrand taken over a known probability distribution. In this work, we study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples. A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set. Though Monte Carlo estimate is unbiased, it has high variance given a small set of samples; thus can result in a spurious objective function. We adopt the distributionally robust optimization perspective to this problem by maximizing the expected objective under the most adversarial distribution. In particular, we propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose. We demonstrate the empirical effectiveness of our proposed framework in synthetic and real-world problems, and characterize its theoretical convergence via Bayesian regret.

    更新日期:2020-01-22
  • SQLFlow: A Bridge between SQL and Machine Learning
    arXiv.cs.LG Pub Date : 2020-01-19
    Yi Wang; Yang Yang; Weiguo Zhu; Yi Wu; Xu Yan; Yongfeng Liu; Yu Wang; Liang Xie; Ziyao Gao; Wenjing Zhu; Xiang Chen; Wei Yan; Mingjie Tang; Yuan Tang

    Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

    更新日期:2020-01-22
  • Finding Optimal Points for Expensive Functions Using Adaptive RBF-Based Surrogate Model Via Uncertainty Quantification
    arXiv.cs.LG Pub Date : 2020-01-19
    Ray-Bing Chen; Yuan Wang; C. F. Jeff Wu

    Global optimization of expensive functions has important applications in physical and computer experiments. It is a challenging problem to develop efficient optimization scheme, because each function evaluation can be costly and the derivative information of the function is often not available. We propose a novel global optimization framework using adaptive Radial Basis Functions (RBF) based surrogate model via uncertainty quantification. The framework consists of two iteration steps. It first employs an RBF-based Bayesian surrogate model to approximate the true function, where the parameters of the RBFs can be adaptively estimated and updated each time a new point is explored. Then it utilizes a model-guided selection criterion to identify a new point from a candidate set for function evaluation. The selection criterion adopted here is a sample version of the expected improvement (EI) criterion. We conduct simulation studies with standard test functions, which show that the proposed method has some advantages, especially when the true surface is not very smooth. In addition, we also propose modified approaches to improve the search performance for identifying global optimal points and to deal with the higher dimension scenarios.

    更新日期:2020-01-22
  • A meta-algorithm for classification using random recursive tree ensembles: A high energy physics application
    arXiv.cs.LG Pub Date : 2020-01-19
    Vidhi Lalchand

    The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes. Classifier learning in the presence of overlapping class distributions is a challenging problem in machine learning. Overlapping classes are described by the presence of ambiguous areas in the feature space with a high density of points belonging to both classes. This often occurs in real-world datasets, one such example is numeric data denoting properties of particle decays derived from high-energy accelerators like the Large Hadron Collider (LHC). A significant body of research targeting the class overlap problem use ensemble classifiers to boost the performance of algorithms by using them iteratively in multiple stages or using multiple copies of the same model on different subsets of the input training data. The former is called boosting and the latter is called bagging. The algorithm proposed in this thesis targets a challenging classification problem in high energy physics - that of improving the statistical significance of the Higgs discovery. The underlying dataset used to train the algorithm is experimental data built from the official ATLAS full-detector simulation with Higgs events (signal) mixed with different background events (background) that closely mimic the statistical properties of the signal generating class overlap. The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics. The algorithm utilizes a unified framework that combines two meta-learning techniques - bagging and boosting. The results show that this combination only works in the presence of a randomization trick in the base learners.

    更新日期:2020-01-22
  • A multimodal deep learning approach for named entity recognition from social media
    arXiv.cs.LG Pub Date : 2020-01-19
    Meysam Asgari-Chenaghlu; M. Reza Feizi-Derakhshi; Leili Farzinvash; Cina Motamed

    Named Entity Recognition (NER) from social media posts is a challenging task. User generated content which forms the nature of social media, is noisy and contains grammatical and linguistic errors. This noisy content makes it much harder for tasks such as named entity recognition. However some applications like automatic journalism or information retrieval from social media, require more information about entities mentioned in groups of social media posts. Conventional methods applied to structured and well typed documents provide acceptable results while compared to new user generated media, these methods are not satisfactory. One valuable piece of information about an entity is the related image to the text. Combining this multimodal data reduces ambiguity and provides wider information about the entities mentioned. In order to address this issue, we propose a novel deep learning approach utilizing multimodal deep learning. Our solution is able to provide more accurate results on named entity recognition task. Experimental results, namely the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions.

    更新日期:2020-01-22
  • Optimal Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting
    arXiv.cs.LG Pub Date : 2020-01-19
    Tianyang Hu; Zuofeng Shang; Guang Cheng

    Classifiers built with neural networks handle large-scale high-dimensional data, such as facial images from computer vision, extremely well while traditional statistical methods often fail miserably. In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk. In particular, a teacher-student framework is proposed that assumes the Bayes classifier to be expressed as ReLU neural networks. In this setup, we obtain a dimension-independent and un-improvable rate of convergence, i.e., $O(n^{-2/3})$, for classifiers trained based on either 0-1 loss or hinge loss. This rate can be further improved to $O(n^{-1})$ when data is separable. Here, $n$ represents the sample size.

    更新日期:2020-01-22
  • Infrequent adverse event prediction in low carbon energy production using machine learning
    arXiv.cs.LG Pub Date : 2020-01-19
    Stefano Coniglio; Anthony J. Dunn; Alain B. Zemkoho

    Machine Learning is one of the fastest growing fields in academia. Many industries are aiming to incorporate machine learning tools into their day to day operation. However the keystone of doing so, is recognising when you have a problem which can be solved using machine learning. Adverse event prediction is one such problem. There are a wide range of methods for the production of sustainable energy. In many of which adverse events can occur which can impede energy production and even damage equipment. The two examples of adverse event prediction in sustainable energy production we examine in this paper are foam formation in anaerobic digestion and condenser fouling in steam turbines as used in nuclear power stations. In this paper we will propose a framework for: formalising a classification problem based around adverse event prediction, building predictive maintenance models capable of predicting these events before they occur and testing the reliability of these models.

    更新日期:2020-01-22
  • SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
    arXiv.cs.LG Pub Date : 2020-01-20
    Ramprasaath R. Selvaraju; Purva Tendulkar; Devi Parikh; Eric Horvitz; Marco Ribeiro; Besmira Nushi; Ece Kamar

    Existing VQA datasets contain questions with varying levels of complexity. While the majority of questions in these datasets require perception for recognizing existence, properties, and spatial relationships of entities, a significant portion of questions pose challenges that correspond to reasoning tasks -- tasks that can only be answered through a synthesis of perception and knowledge about the world, logic and / or reasoning. This distinction allows us to notice when existing VQA models have consistency issues -- they answer the reasoning question correctly but fail on associated low-level perception questions. For example, models answer the complex reasoning question "Is the banana ripe enough to eat?" correctly, but fail on the associated perception question "Are the bananas mostly green or yellow?" indicating that the model likely answered the reasoning question correctly but for the wrong reason. We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of 200K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split. Additionally, we propose an approach called Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the model to attend do the same parts of the image when answering the reasoning question and the perception sub questions. We show that SQuINT improves model consistency by 7.8%, also marginally improving its performance on the Reasoning questions in VQA, while also displaying qualitatively better attention maps.

    更新日期:2020-01-22
  • Memristor Hardware-Friendly Reinforcement Learning
    arXiv.cs.LG Pub Date : 2020-01-20
    Nan Wu; Adrien Vincent; Dmitri Strukov; Yuan Xie

    Recently, significant progress has been made in solving sophisticated problems among various domains by using reinforcement learning (RL), which allows machines or agents to learn from interactions with environments rather than explicit supervision. As the end of Moore's law seems to be imminent, emerging technologies that enable high performance neuromorphic hardware systems are attracting increasing attention. Namely, neuromorphic architectures that leverage memristors, the programmable and nonvolatile two-terminal devices, as synaptic weights in hardware neural networks, are candidates of choice to realize such highly energy-efficient and complex nervous systems. However, one of the challenges for memristive hardware with integrated learning capabilities is prohibitively large number of write cycles that might be required during learning process, and this situation is even exacerbated under RL situations. In this work we propose a memristive neuromorphic hardware implementation for the actor-critic algorithm in RL. By introducing a two-fold training procedure (i.e., ex-situ pre-training and in-situ re-training) and several training techniques, the number of weight updates can be significantly reduced and thus it will be suitable for efficient in-situ learning implementations. As a case study, we consider the task of balancing an inverted pendulum, a classical problem in both RL and control theory. We believe that this study shows the promise of using memristor-based hardware neural networks for handling complex tasks through in-situ reinforcement learning.

    更新日期:2020-01-22
  • Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective
    arXiv.cs.LG Pub Date : 2020-01-20
    Shun-ichi Amari

    It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role: When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere reduces to a Gaussian distribution of negligibly small covariances.

    更新日期:2020-01-22
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
加州大学洛杉矶分校
上海纽约大学William Glover
南开大学化学院周其林
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug