• arXiv.cs.AI Pub Date : 2020-04-06
John Burden; Daniel Kudenko

Potential Based Reward Shaping combined with a potential function based on appropriately defined abstract knowledge has been shown to significantly improve learning speed in Reinforcement Learning. MultiGrid Reinforcement Learning (MRL) has further shown that such abstract knowledge in the form of a potential function can be learned almost solely from agent interaction with the environment. However

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-06
Shoaib Ahmed Siddiqui; Dominique Mercier; Andreas Dengel; Sheraz Ahmed

With the rise in the employment of deep learning methods in safety-critical scenarios, interpretability is more essential than ever before. Although many different directions regarding interpretability have been explored for visual modalities, time-series data has been neglected with only a handful of methods tested due to their poor intelligibility. We approach the problem of interpretability in a

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-06
Omer Ben-Porat; Sharon Hirsch; Lital Kuchy; Guy Elad; Roi Reichart; Moshe Tennenholtz

The connection between messaging and action is fundamental both to web applications, such as web search and sentiment analysis, and to economics. However, while prominent online applications exploit messaging in natural (human) language in order to predict non-strategic action selection, the economics literature focuses on the connection between structured stylized messaging to strategic decisions

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-06
Gustavo A. Valencia-Zapata; Okan Ersoy; Carolina Gonzalez-Canas; Michael G. Zentner; Gerhard Klimeck

Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Yeping Hu; Wei Zhan; Masayoshi Tomizuka

Accurately predicting the possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. A number of methodologies have been proposed to solve prediction problems

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Daya Guo; Akari Asai; Duyu Tang; Nan Duan; Ming Gong; Linjun Shou; Daxin Jiang; Jian Yin; Ming Zhou

We study the problem of generating inferential texts of events for a variety of commonsense like \textit{if-else} relations. Existing approaches typically use limited evidence from training examples and learn for each relation individually. In this work, we use multiple knowledge sources as fuels for the model. Existing commonsense knowledge bases like ConceptNet are dominated by taxonomic knowledge

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Pratyay Banerjee; Chitta Baral

Open Domain Question Answering requires systems to retrieve external knowledge and perform multi-hop reasoning by composing knowledge spread over multiple sentences. In the recently introduced open domain question answering challenge datasets, QASC and OpenBookQA, we need to perform retrieval of facts and compose facts to correctly answer questions. In our work, we learn a semantic knowledge ranking

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Rémy Portelas; Katja Hofmann; Pierre-Yves Oudeyer

A major challenge in the Deep RL (DRL) community is to train agents able to generalize over unseen situations, which is often approached by training them on a diversity of tasks (or environments). A powerful method to foster diversity is to procedurally generate tasks by sampling their parameters from a multi-dimensional distribution, enabling in particular to propose a different task for each training

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Saeed Rahimi Gorji; Ole-Christoffer Granmo; Sondre Glimsdal; Jonathan Edwards; Morten Goodwin

The Tsetlin Machine (TM) is a machine learning algorithm founded on the classical Tsetlin Automaton (TA) and game theory. It further leverages frequent pattern mining and resource allocation principles to extract common patterns in the data, rather than relying on minimizing output error, which is prone to overfitting. Unlike the intertwined nature of pattern representation in neural networks, a TM

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Richard Meyes; Moritz Schneider; Tobias Meisen

The demand for more transparency of decision-making processes of deep reinforcement learning agents is greater than ever, due to their increased use in safety critical and ethically challenging domains such as autonomous driving. In this empirical study, we address this lack of transparency following an idea that is inspired by research in the field of neuroscience. We characterize the learned representations

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Kazutoshi Shinoda; Akiko Aizawa

We present a deep generative model of question-answer (QA) pairs for machine reading comprehension. We introduce two independent latent random variables into our model in order to diversify answers and questions separately. We also study the effect of explicitly controlling the KL term in the variational lower bound in order to avoid the "posterior collapse" issue, where the model ignores latent variables

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Ziming Li; Sungjin Lee; Baolin Peng; Jinchao Li; Shahin Shayandeh; Jianfeng Gao

Reinforcement-based training methods have emerged as the most popular choice to train an efficient and effective dialog policy. However, these methods are suffering from sparse and unstable reward signals usually returned from the user simulator at the end of the dialog. Besides, the reward signal is manually designed by human experts which requires domain knowledge. A number of adversarial learning

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Shu Chen; Zeqian Ju; Xiangyu Dong; Hongchao Fang; Sicheng Wang; Yue Yang; Jiaqi Zeng; Ruisi Zhang; Ruoyu Zhang; Meng Zhou; Penghui Zhu; Pengtao Xie

Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build a large-scale medical dialogue dataset -- MedDialog -- that contains 1.1 million conversations between patients and doctors and 4 million utterances

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Germán Kruszewski; Ionut-Teodor Sorodoc; Tomas Mikolov

Continual Learning has been often framed as the problem of training a model in a sequence of tasks. In this regard, Neural Networks have been attested to forget the solutions to previous task as they learn new ones. Yet, modelling human life-long learning does not necessarily require any crisp notion of tasks. In this work, we propose a benchmark based on language modelling in a multilingual and multidomain

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Loïck BonniotWIDE; Christoph NeumannWIDE; François TaïaniWIDE

Diagnosing problems in Internet-scale services remains particularly difficult and costly for both content providers and ISPs. Because the Internet is decentralized, the cause of such problems might lie anywhere between an end-user's device and the service datacenters. Further, the set of possible problems and causes is not known in advance, making it impossible in practice to train a classifier with

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-03
Ioannis D. Apostolopoulos; Peter P. Groumpos; Dimitris I. Apostolopoulos

Purpose: In this study, the recently emerged advances in Fuzzy Cognitive Maps (FCM) are investigated and employed, for achieving the automatic and non-invasive diagnosis of Coronary Artery Disease (CAD). Methods: A Computer-Aided Diagnostic model for the acceptable and non-invasive prediction of CAD using the State Space Advanced FCM (AFCM) approach is proposed. Also, a rule-based mechanism is incorporated

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Su Zhu; Jieyu Li; Lu Chen; Kai Yu

Dialogue state tracking (DST) aims at estimating the current dialogue state given all the preceding conversation. For multi-domain DST, the data sparsity problem is a major obstacle due to increased numbers of state candidates and dialogue lengths. To encode the dialogue context efficiently, we propose to utilize the previous dialogue state (predicted) and the current dialogue utterance as the input

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Soham Parikh; Quaizar Vohra; Mitul Tiwari

Conversational AI assistants are becoming popular and question-answering is an important part of any conversational assistant. Using relevant utterances as features in question-answering has shown to improve both the precision and recall for retrieving the right answer by a conversational assistant. Hence, utterance generation has become an important problem with the goal of generating relevant utterances

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-06
Iztok Fister Jr.; Karin Fister; Iztok Fister

A COVID-19 pandemic has already proven itself to be a global challenge. It proves how vulnerable humanity can be. It has also mobilized researchers from different sciences and different countries in the search for a way to fight this potentially fatal disease. In line with this, our study analyses the abstracts of papers related to COVID-19 and coronavirus-related-research using association rule text

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Changmao Li; Jinho D. Choi

We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utterance

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-07
Maxwell Crouse; Constantine Nakos; Ibrahim Abdelaziz; Kenneth Forbus

Analogy is core to human cognition. It allows us to solve problems based on prior experience, it governs the way we conceptualize new information, and it even influences our visual perception. The importance of analogy to humans has made it an active area of research in the broader field of artificial intelligence, resulting in data-efficient models that learn and reason in human-like ways. While analogy

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2019-05-23
Long V. Ho; Melissa D. Aczon; David Ledbetter; Randall Wetzel

Despite the success of deep learning models in healthcare, their lack of transparency has impeded their acceptance. The goals of this work were to highlight which features contributed to a recurrent neural network's (RNN) predictions of ICU mortality and compare this information with clinical expectations. Feature contributions to the RNN's predictions for individual patients were computed using two

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2019-07-10
Cheng He; Shihua Huang; Ran Cheng; Kay Chen Tan; Yaochu Jin

Recently, more and more works have proposed to drive evolutionary algorithms using machine learning models.Usually, the performance of such model based evolutionary algorithms is highly dependent on the training qualities of the adopted models.Since it usually requires a certain amount of data (i.e. the candidate solutions generated by the algorithms) for model training, the performance deteriorates

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-01
Craig Sherstan; Bilal Kartal; Pablo Hernandez-Leal; Matthew E. Taylor

Predictive auxiliary tasks have been shown to improve performance in numerous reinforcement learning works, however, this effect is still not well understood. The primary purpose of the work presented here is to investigate the impact that an auxiliary task's prediction timescale has on the agent's policy performance. We consider auxiliary tasks which learn to make on-policy predictions using temporal

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-04
Vinoth Pandian Sermuga Pandian; Sarah Suleri

User Interface (UI) design is an creative process that involves considerable reiteration and rework. Designers go through multiple iterations of different prototyping fidelities to create a UI design. In this research, we propose to modify the UI design process by assisting it with artificial intelligence (AI). We propose to enable AI to perform repetitive tasks for the designer while allowing the

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-06
Hengyi Cai; Hongshen Chen; Yonghao Song; Cheng Zhang; Xiaofang Zhao; Dawei Yin

Current state-of-the-art neural dialogue models learn from human conversations following the data-driven paradigm. As such, a reliable training corpus is the crux of building a robust and well-behaved dialogue model. However, due to the open-ended nature of human conversations, the quality of user-generated training data varies greatly, and effective training samples are typically insufficient while

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2019-04-10
Phong Nguyen-Ha; Lam Huynh; Esa Rahtu; Janne Heikkila

The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans. This paper introduces the Generative Adversarial Query Network (GAQN), a general learning framework for novel view synthesis that combines Generative Query Network (GQN) and Generative Adversarial Networks (GANs). The conventional GQN encodes input

更新日期：2020-04-08
• arXiv.cs.AI Pub Date : 2020-04-02
Eli Sherman; David Arbour; Ilya Shpitser

In many applied fields, researchers are often interested in tailoring treatments to unit-level characteristics in order to optimize an outcome of interest. Methods for identifying and estimating treatment policies are the subject of the dynamic treatment regime literature. Separately, in many settings the assumption that data are independent and identically distributed does not hold due to inter-subject

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-02
Ramtin Keramati; Emma Brunskill

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-02
Ran Wang; Kun Tao; Dingjie Song; Zhilong Zhang; Xiao Ma; Xi'ao Su; Xinyu Dai

Existing question answering systems can only predict answers without explicit reasoning processes, which hinder their explainability and make us overestimate their ability of understanding and reasoning over natural language. In this work, we propose a novel task of reading comprehension, in which a model is required to provide final answers and reasoning processes. To this end, we introduce a formalism

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-02
Benjamin Doerr

One hope of using non-elitism in evolutionary computation is that it aids leaving local optima. We perform a rigorous runtime analysis of a basic non-elitist evolutionary algorithm (EA), the $(\mu,\lambda)$ EA, on the most basic benchmark function with a local optimum, the jump function. We prove that for all reasonable values of the parameters and the problem, the expected runtime of the $(\mu,\lambda)$

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-03
Supriyo Ghosh; Sean Laguna; Shiau Hong Lim; Laura Wynter; Hasan Poonawala

Air traffic control is an example of a highly challenging operational problem that is readily amenable to human expertise augmentation via decision support technologies. In this paper, we propose a new intelligent decision making framework that leverages multi-agent reinforcement learning (MARL) to dynamically suggest adjustments of aircraft speeds in real-time. The goal of the system is to enhance

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-03
Sebastien Gros; Mario Zanon

Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-03
Zina Ibrahim; Honghan Wu; Richard Dobson

Many areas of research are characterised by the deluge of large-scale highly-dimensional time-series data. However, using the data available for prediction and decision making is hampered by the current lag in our ability to uncover and quantify true interactions that explain the outcomes.We are interested in areas such as intensive care medicine, which are characterised by i) continuous monitoring

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-02
Zvezdin Besarabov; Todor Kolev

Metaheuristic search strategies have proven their effectiveness against man-made solutions in various contexts. They are generally effective in local search area exploitation, and their overall performance is largely impacted by the balance between exploration and exploitation. Recent developments in parallel local search explore methods to take advantage of the efficient local exploitation of searches

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-03
Paulo R. de O. da Costa; Jason Rhuggenaath; Yingqian Zhang; Alp Akcay

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-03
Samuel Läubli; Sheila Castilho; Graham Neubig; Rico Sennrich; Qinlan Shen; Antonio Toral

The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design -

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-03
Alberto Alvarez; Jose Font; Julian Togelius

We propose modeling designer style in mixed-initiative game content creation tools as archetypical design traces. These design traces are formulated as transitions between design styles; these design styles are in turn found through clustering all intermediate designs along the way to making a complete design. This method is implemented in the Evolutionary Dungeon Designer, a prototype mixed-initiative

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2018-04-23
Sobhan Moosavi; Arnab Nandi; Rajiv Ramnath

Telematics data is becoming increasingly available due to the ubiquity of devices that collect data during drives, for different purposes, such as usage based insurance (UBI), fleet management, navigation of connected vehicles, etc. Consequently, a variety of data-analytic applications have become feasible that extract valuable insights from the data. In this paper, we address the especially challenging

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2019-08-05
Koen Holtman

Corrigibility is a safety property for artificially intelligent agents. A corrigible agent will not resist attempts by authorized parties to alter the goals and constraints that were encoded in the agent when it was first started. This paper shows how to construct a safety layer that adds corrigibility to arbitrarily advanced utility maximizing agents, including possible future agents with Artificial

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2019-10-18
Ashish Kumar; Toby Buckley; Qiaozhi Wang; Alicia Kavelaars; Ilya Kuzovkin

Success stories of applied machine learning can be traced back to the datasets and environments that were put forward as challenges for the community. The challenge that the community sets as a benchmark is usually the challenge that the community eventually solves. The ultimate challenge of reinforcement learning research is to train real agents to operate in the real environment, but until now there

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2019-11-05
Ramtin Keramati; Christoph Dann; Alex Tamkin; Emma Brunskill

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-03-14
Ning Shi

When learning a language, people can quickly expand their understanding of the unknown content by using compositional skills, such as from two words "go" and "fast" to a new phrase "go fast." In recent work of Lake and Baroni (2017), modern Sequence-to-Sequence(seq2seq) Recurrent Neural Networks (RNNs) can make powerful zero-shot generalizations in specifically controlled experiments. However, there

更新日期：2020-04-06
• arXiv.cs.AI Pub Date : 2020-04-01
Dietmar Jannach; Ahtsham Manzoor; Wanling Cai; Li Chen

Recommender systems are software applications that help users to find items of interest in situations of information overload. Current research often assumes a one-shot interaction paradigm, where the users' preferences are estimated based on past observed behavior and where the presentation of a ranked list of suggestions is the main, one-directional form of user interaction. Conversational recommender

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-01
Thomas Hellström; Virginia Dignum; Suna Bensch

In public media as well as in scientific publications, the term \emph{bias} is used in conjunction with machine learning in many different contexts, and with many different meanings. This paper proposes a taxonomy of these different meanings, terminology, and definitions by surveying the, primarily scientific, literature on machine learning. In some cases, we suggest extensions and modifications to

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-01
David A. Robb; Muneeb I. Ahmad; Carlo Tiseo; Simona Aracri; Alistair C. McConnell; Vincent Page; Christian Dondrup; Francisco J. Chiyah Garcia; Hai-Nguyen Nguyen; Èric Pairet; Paola Ardón Ramírez; Tushar Semwal; Hazel M. Taylor; Lindsay J. Wilson; David Lane; Helen Hastie; Katrin Lohan

Public perceptions of Robotics and Artificial Intelligence (RAI) are important in the acceptance, uptake, government regulation and research funding of this technology. Recent research has shown that the public's understanding of RAI can be negative or inaccurate. We believe effective public engagement can help ensure that public opinion is better informed. In this paper, we describe our first iteration

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-01
Ya-Yen Tsai; Bo Xiao; Edward Johns; Guang-Zhong Yang

Learning from Demonstration is increasingly used for transferring operator manipulation skills to robots. In practice, it is important to cater for limited data and imperfect human demonstrations, as well as underlying safety constraints. This paper presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks. Through interactions within the constrained space

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Roshni G. Iyer; Yizhou Sun; Wei Wang; Justin Gottschlich

Traditional code transformation structures, such as an abstract syntax tree, may have limitations in their ability to extract semantic meaning from code. Others have begun to work on this issue, such as the state-of-the-art Aroma system and its simplified parse tree (SPT). Continuing this research direction, we present a new graphical structure to capture semantics from code using what we refer to

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Riku Arakawa; Shintaro Shiba

We demonstrate the first reinforcement-learning application for robots equipped with an event camera. Because of the considerably lower latency of the event camera, it is possible to achieve much faster control of robots compared with the existing vision-based reinforcement-learning applications using standard cameras. To handle a stream of events for reinforcement learning, we introduced an image-like

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Tai Vu

In recent years, the availability of massive data sets and improved computing power have driven the advent of cutting-edge machine learning algorithms. However, this trend has triggered growing concerns associated with its ethical issues. In response to such a phenomenon, this study proposes a feasible solution that combines ethics and computer science materials in artificial intelligent classrooms

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Sebastien Gros; Mario Zanon; Alberto Bemporad

For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Yong-Lu Li; Liang Xu; Xinpeng Liu; Xijie Huang; Yue Xu; Shiyi Wang; Hao-Shu Fang; Ze Ma; Mingyang Chen; Cewu Lu

Existing image-based activity understanding methods mainly adopt direct mapping, i.e. from image to activity concepts, which may encounter performance bottleneck since the huge gap. In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics. Human Body Part States (PaSta) are fine-grained action semantic tokens, e.g. , which

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Luc Libralesso; Florian Fontan

In this article, we present the anytime tree search algorithm we designed for the 2018 ROADEF/EURO challenge glass cutting problem proposed by the French company Saint-Gobain. The resulting program was ranked first among 64 participants. Its key components are: a new search algorithm called Memory Bounded A* (MBA*) with guide functions, a symmetry breaking strategy, and a pseudo-dominance rule. We

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Anssi Kanervisto; Christian Scheller; Ville Hautamäki

Reinforcement learning (RL) has been successful in training agents in various learning environments, including video-games. However, such work modifies and shrinks the action space from the game's original. This is to avoid trying "pointless" actions and to ease the implementation. Currently, this is mostly done based on intuition, with little systematic research supporting the design decisions. In

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Anssi Kanervisto; Joonas Pussinen; Ville Hautamäki

Behavioural cloning, where a computer is taught to perform a task based on demonstrations, has been successfully applied to various video games and robotics tasks, with and without reinforcement learning. This also includes end-to-end approaches, where a computer plays a video game like humans do: by looking at the image displayed on the screen, and sending keystrokes to the game. As a general approach

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-03-31
Xiao Lei Zhang; Anish Agarwal

The study of unsupervised learning can be generally divided into two categories: imitation learning and reinforcement learning. In imitation learning the machine learns by mimicking the behavior of an expert system whereas in reinforcement learning the machine learns via direct environment feedback. Traditional deep reinforcement learning takes a significant time before the machine starts to converge

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-03-31
Uri Shaham; Tom Zahavy; Cesar Caraballo; Shiwani Mahajan; Daisy Massey; Harlan Krumholz

We propose a novel reinforcement learning-based approach for adaptive and iterative feature selection. Given a masked vector of input features, a reinforcement learning agent iteratively selects certain features to be unmasked, and uses them to predict an outcome when it is sufficiently confident. The algorithm makes use of a novel environment setting, corresponding to a non-stationary Markov Decision

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Luciano Cavalcante Siebert; Rijk Mercuur; Virginia Dignum; Jeroen van den Hoven; Catholijn Jonker

Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what extent

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Weichao Mao; Kaiqing Zhang; Erik Miehling; Tamer Başar

Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of

更新日期：2020-04-03
• arXiv.cs.AI Pub Date : 2020-04-02
Iago París; Raquel Sánchez-Cauce; Francisco Javier Díez

A sum-product network (SPN) is a probabilistic model, based on a rooted acyclic directed graph, in which terminal nodes represent univariate probability distributions and non-terminal nodes represent convex combinations (weighted sums) and products of probability functions. They are closely related to probabilistic graphical models, in particular to Bayesian networks with multiple context-specific

更新日期：2020-04-03
Contents have been reproduced by permission of the publishers.

down
wechat
bug