• arXiv.cs.MA Pub Date : 2020-01-16
Mycal Tucker; Yilun Zhou; Julie Shah

Robotic agents must adopt existing social conventions in order to be effective teammates. These social conventions, such as driving on the right or left side of the road, are arbitrary choices among optimal policies, but all agents on a successful team must use the same convention. Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them. We build upon this work by introducing a technique called Adversarial Self-Play (ASP) that uses adversarial training to shape the space of possible learned policies and substantially improves learning efficiency. ASP only requires the addition of unpaired data: a dataset of outputs produced by the social convention without associated inputs. Theoretical analysis reveals how ASP shapes the policy space and the circumstances (when behaviors are clustered or exhibit some other structure) under which it offers the greatest benefits. Empirical results across three domains confirm ASP's advantages: it produces models that more closely match the desired social convention when given as few as two paired datapoints.

更新日期：2020-01-17
• arXiv.cs.MA Pub Date : 2018-09-04
Juste Raimbault

Co-evolutionary processes are according to the evolutionary urban theory at the center of urban systems dynamics. Their empirical observation or within models of simulation remains however relatively rare. This chapter is focused on the co-evolution of transportation networks and cities and applies high performance computing numerical experiments to the SimpopNet co-evolution model in order to understand its behavior. We introduce specific indicators to quantify trajectories of such models for systems of cities, and apply these to exhibit co-evolutionary regimes of the model. This illustrates how the systematic exploration of a simulation model can qualitatively transform the knowledge it provides.

更新日期：2020-01-17
• arXiv.cs.MA Pub Date : 2019-07-26
Weixun Wang; Tianpei Yang; Yong Liu; Jianye Hao; Xiaotian Hao; Yujing Hu; Yingfeng Chen; Changjie Fan; Yang Gao

In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since each agent's selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the increase in the number of agents. Previous works borrow various multiagent coordination mechanisms into deep learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents that different actions have different influences on other agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions' influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with several network architectures.

更新日期：2020-01-17
• arXiv.cs.MA Pub Date : 2020-01-15
Iacovos Ioannou; Vasos Vassiliou; Christophoros Christophorou; Andreas Pitsillides

Device to Device (D2D) Communication is one of the technology components of the evolving 5G architecture, as it promises improvements in energy efficiency, spectral efficiency, overall system capacity, and higher data rates. The above noted improvements in network performance spearheaded a vast amount of research in D2D, which have identified significant challenges that need to be addressed before realizing their full potential in emerging 5G Networks. Towards this end, this paper proposes the use of a distributed intelligent approach to control the generation of D2D networks. More precisely, the proposed approach uses Belief-Desire-Intention (BDI) intelligent agents with extended capabilities (BDIx) to manage each D2D node independently and autonomously, without the help of the Base Station. The paper includes detailed algorithmic description for the decision of transmission mode, which maximizes the data rate, minimizes the power consumptions, while taking into consideration the computational load. Simulations show the applicability of BDI agents in jointly solving D2D challenges.

更新日期：2020-01-16
• arXiv.cs.MA Pub Date : 2020-01-15
Reshef Meir; Gal Shahaf; Ehud Shapiro; Nimrod Talmon

Voting rules may fail to implement the will of the society when only some voters actively participate, and/or in the presence of sybil (fake or duplicate) voters. Here we aim to address social choice in the the presence of sybils and the absence of full participation. To do so we assume the status-quo (Reality) as an ever-present distinguished alternative, and study \emph{Reality Enforcing voting rules}, which add virtual votes in support of the status quo. We measure the tradeoff between safety and liveness (the ability of active honest voters to maintain/change the status quo, respectively) in a variety of domains, and show that Reality Enforcing voting is optimal.

更新日期：2020-01-16
• arXiv.cs.MA Pub Date : 2019-06-15
Yuanyuan Shi; Baosen Zhang

In this work, we study the interaction of strategic players in continuous action Cournot games with limited information feedback. Cournot game is the essential model for many socio-economic systems where players learn and compete. In addition, in many practical settings these players do not have full knowledge of the system or of each other. In this limited information setting, it becomes important to understand the dynamics and limiting behavior of the players. Specifically, we assume players follow strategies such that in hindsight their payoffs are not exceeded by any single deviating action. Given this no-regret guarantee, we prove that under standard assumptions, the players' joint action (both in the sense of time average and final iteration convergence) converges to the unique Nash equilibrium. In addition, our results naturally extend the existing regret analysis on time average convergence to obtain final iteration convergence rates. Together, our work presents significantly sharper and generalized convergence results, and shows how exploiting the game information feedback can influence the convergence rates.

更新日期：2020-01-16
• arXiv.cs.MA Pub Date : 2020-01-13
Anna Guerra; Davide Dardari; Petar M. Djuric

Nowadays there is a growing research interest on the possibility of enriching small flying robots with autonomous sensing and online navigation capabilities. This will enable a large number of applications spanning from remote surveillance to logistics, smarter cities and emergency aid in hazardous environments. In this context, an emerging problem is to track unauthorized small unmanned aerial vehicles (UAVs) hiding behind buildings or concealing in large UAV networks. In contrast with current solutions mainly based on static and on-ground radars, this paper proposes the idea of a dynamic radar network of UAVs for real-time and high-accuracy tracking of malicious targets. To this end, we describe a solution for real-time navigation of UAVs to track a dynamic target using heterogeneously sensed information. Such information is shared by the UAVs with their neighbors via multi-hops, allowing tracking the target by a local Bayesian estimator running at each agent. Since not all the paths are equal in terms of information gathering point-of-view, the UAVs plan their own trajectory by minimizing the posterior covariance matrix of the target state under UAV kinematic and anti-collision constraints. Our results show how a dynamic network of radars attains better localization results compared to a fixed configuration and how the on-board sensor technology impacts the accuracy in tracking a target with different radar cross sections, especially in non line-of-sight (NLOS) situations.

更新日期：2020-01-15
• arXiv.cs.MA Pub Date : 2020-01-14
David Balduzzi; Wojiech M Czarnecki; Thomas W Anthony; Ian M Gemp; Edward Hughes; Joel Z Leibo; Georgios Piliouras; Thore Graepel

With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.

更新日期：2020-01-15
• arXiv.cs.MA Pub Date : 2018-06-16
Laurent Bulteau; Gal Shahaf; Ehud Shapiro; Nimrod Talmon

We present a unifying framework encompassing many social choice settings. Viewing each social choice setting as voting in a suitable metric space, we consider a general model of social choice over metric spaces, in which---similarly to the spatial model of elections---each voter specifies an ideal element of the metric space. The ideal element functions as a vote, where each voter prefers elements that are closer to her ideal element. But it also functions as a proposal, thus making all participants equal not only as voters but also as proposers. We consider Condorcet aggregation and a continuum of solution concepts, ranging from minimizing the sum of distances to minimizing the maximum distance. We study applications of the abstract model to various social choice settings, including single-winner elections, committee elections, participatory budgeting, and participatory legislation. For each setting, we compare each solution concept to known voting rules and study various properties of the resulting voting rules. Our framework provides expressive aggregation for a broad range of social choice settings while remaining simple for voters, and may enable a unified and integrated implementation for all these settings, as well as unified extensions such as sybil-resiliency, proxy voting, and deliberative decision making.

更新日期：2020-01-15
• arXiv.cs.MA Pub Date : 2020-01-13
Faheem Zafari; Prithwish Basu; Kin K. Leung; Jian Li; Ananthram Swami; Don Towsley

The growing demand for edge computing resources, particularly due to increasing popularity of Internet of Things (IoT), and distributed machine/deep learning applications poses a significant challenge. On the one hand, certain edge service providers (ESPs) may not have sufficient resources to satisfy their applications according to the associated service-level agreements. On the other hand, some ESPs may have additional unused resources. In this paper, we propose a resource-sharing framework that allows different ESPs to optimally utilize their resources and improve the satisfaction level of applications subject to constraints such as communication cost for sharing resources across ESPs. Our framework considers that different ESPs have their own objectives for utilizing their resources, thus resulting in a multi-objective optimization problem. We present an $N$-person \emph{Nash Bargaining Solution} (NBS) for resource allocation and sharing among ESPs with \emph{Pareto} optimality guarantee. Furthermore, we propose a \emph{distributed}, primal-dual algorithm to obtain the NBS by proving that the strong-duality property holds for the resultant resource sharing optimization problem. Using synthetic and real-world data traces, we show numerically that the proposed NBS based framework not only enhances the ability to satisfy applications' resource demands, but also improves utilities of different ESPs.

更新日期：2020-01-14
• arXiv.cs.MA Pub Date : 2019-10-10
Michael Blondin; Javier Esparza; Blaise Genest; Martin Helfrich; Stefan Jaax

Angluin et al. proved that population protocols compute exactly the predicates definable in Presburger arithmetic (PA), the first-order theory of addition. As part of this result, they presented a procedure that translates any formula $\varphi$ of quantifier-free PA with remainder predicates (which has the same expressive power as full PA) into a population protocol with $2^{O(\text{poly}(|\varphi|))}$ states that computes $\varphi$. More precisely, the number of states of the protocol is exponential in both the bit length of the largest coefficient in the formula, and the number of nodes of its syntax tree. In this paper, we prove that every formula $\varphi$ of quantifier-free PA with remainder predicates is computable by a leaderless population protocol with $O(\text{poly}(|\varphi|))$ states. Our proof is based on several new constructions, which may be of independent interest. Given a formula $\varphi$ of quantifier-free PA with remainder predicates, a first construction produces a succinct protocol (with $O(|\varphi|^3)$ leaders) that computes $\varphi$; this completes the work initiated in [STACS'18], where we constructed such protocols for a fragment of PA. For large enough inputs, we can get rid of these leaders. If the input is not large enough, then it is small, and we design another construction producing a succinct protocol with one leader that computes $\varphi$. Our last construction gets rid of this leader for small inputs.

更新日期：2020-01-14
• arXiv.cs.MA Pub Date : 2019-11-04
Chainarong Amornbunchornvej; Tanya Berger-Wolf

How do groups of individuals achieve consensus in movement decisions? Do individuals follow their friends, the one predetermined leader, or whomever just happens to be nearby? To address these questions computationally, we formalize "Coordination Strategy Inference Problem". In this setting, a group of multiple individuals moves in a coordinated manner towards a target path. Each individual uses a specific strategy to follow others (e.g. nearest neighbors, pre-defined leaders, preferred friends). Given a set of time series that includes coordinated movement and a set of candidate strategies as inputs, we provide the first methodology (to the best of our knowledge) to infer whether each individual uses local-agreement-system or dictatorship-like strategy to achieve movement coordination at the group level. We evaluate and demonstrate the performance of the proposed framework by predicting the direction of movement of an individual in a group in both simulated datasets as well as two real-world datasets: a school of fish and a troop of baboons. Moreover, since there is no prior methodology for inferring individual-level strategies, we compare our framework with the state-of-the-art approach for the task of classification of group-level-coordination models. The results show that our approach is highly accurate in inferring the correct strategy in simulated datasets even in complicated mixed strategy settings, which no existing method can infer. In the task of classification of group-level-coordination models, our framework performs better than the state-of-the-art approach in all datasets. Animal data experiments show that fish, as expected, follow their neighbors, while baboons have a preference to follow specific individuals. Our methodology generalizes to arbitrary time series data of real numbers, beyond movement data.

更新日期：2020-01-14
• arXiv.cs.MA Pub Date : 2017-12-12

Running agent-based models (ABMs) is a burdensome computational task, specially so when considering the flexibility ABMs intrinsically provide. This paper uses a bundle of model configuration parameters along with obtained results from a validated ABM to train some Machine Learning methods for socioeconomic optimal cases. A larger space of possible parameters and combinations of parameters are then used as input to predict optimal cases and confirm parameters calibration. Analysis of the parameters of the optimal cases are then compared to the baseline model. This exploratory initial exercise confirms the adequacy of most of the parameters and rules and suggests changing of directions to two parameters. Additionally, it helps highlight metropolitan regions of higher quality of life. Better understanding of ABM mechanisms and parameters' influence may nudge policy-making slightly closer to optimal level.

更新日期：2020-01-14
• arXiv.cs.MA Pub Date : 2018-05-08
Mengbin Ye; Minh Hoang Trinh; Young-Hun Lim; Brian D. O. Anderson; Hyo-Sung Ahn

In this paper, and inspired by the recent discrete-time model in [1,2], we study two continuous-time opinion dynamics models (Model 1 and Model 2) where the individuals discuss opinions on multiple logically interdependent topics. The logical interdependence between the different topics is captured by a logic' matrix, which is distinct from the Laplacian matrix capturing interactions between individuals. For each of Model 1 and Model 2, we obtain a necessary and sufficient condition for the network to reach to a consensus on each separate topic. The condition on Model 1 involves a combination of the eigenvalues of the logic matrix and Laplacian matrix, whereas the condition on Model 2 requires only separate conditions on the logic matrix and Laplacian matrix. Further investigations of Model 1 yields two sufficient conditions for consensus, and allow us to conclude that one way to guarantee a consensus is to reduce the rate of interaction between individuals exchanging opinions. By placing further restrictions on the logic matrix, we also establish a set of Laplacian matrices which guarantee consensus for Model 1. The two models are also expanded to include stubborn individuals, who remain attached to their initial opinions. Sufficient conditions are obtained for guaranteeing convergence of the opinion dynamics system, with the final opinions generally being at a persistent disagreement. Simulations are provided to illustrate the results.

更新日期：2020-01-14
• arXiv.cs.MA Pub Date : 2018-10-27
Christian A. Schroeder de Witt; Jakob N. Foerster; Gregory Farquhar; Philip H. S. Torr; Wendelin Boehmer; Shimon Whiteson

Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others' observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.

更新日期：2020-01-14
• arXiv.cs.MA Pub Date : 2020-01-02
Asma Khatun; Sk. Golam Sarowar Hossain

更新日期：2020-01-13
• arXiv.cs.MA Pub Date : 2020-01-10
Brionna Davis; Grace Jennings; Taylor Pothast; Ilias Gerostathopoulos; Evangelos Pournaras; Raphael E. Stern

New mobility concepts are at the forefront of research and innovation in smart cities. The introduction of connected and autonomous vehicles enables new possibilities in vehicle routing. Specifically, knowing the origin and destination of each agent in the network can allow for real-time routing of the vehicles to optimize network performance. However, this relies on individual vehicles being "altruistic" i.e., being willing to accept an alternative non-preferred route in order to achieve a network-level performance goal. In this work, we conduct a study to compare different levels of agent altruism and the resulting effect on the network-level traffic performance. Specifically, this study compares the effects of different underlying urban structures on the overall network performance, and investigates which characteristics of the network make it possible to realize routing improvements using a decentralized optimization router. The main finding is that, with increased vehicle altruism, it is possible to balance traffic flow among the links of the network. We show evidence that the decentralized optimization router is more effective with networks of high load while we study the influence of cities characteristics, in particular: networks with a higher number of nodes (intersections) or edges (roads) per unit area allow for more possible alternate routes, and thus higher potential to improve network performance.

更新日期：2020-01-13
• arXiv.cs.MA Pub Date : 2020-01-04
Minghuan Liu; Ming Zhou; Weinan Zhang; Yuzheng Zhuang; Jun Wang; Wulong Liu; Yong Yu

In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents' policies, which can recover agents' policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods.

更新日期：2020-01-13
• arXiv.cs.MA Pub Date : 2020-01-10
Gian Maria Campedelli; Francesco Calderoni; Mario Paolucci; Tommaso Comunale; Daniele Vilone; Federico Cecconi; Giulia Andrighetto

Criminal organizations exploit their presence on territories and local communities to recruit new workforce in order to carry out their criminal activities and business. The ability to attract individuals is crucial for maintaining power and control over the territories in which these groups are settled. This study proposes the formalization, development and analysis of an agent-based model (ABM) that simulates a neighborhood of Palermo (Sicily) with the aim to understand the pathways that lead individuals to recruitment into organized crime groups (OCGs). Using empirical data on social, economic and criminal conditions of the area under analysis, we use a multi-layer network approach to simulate this scenario. As the final goal, we test different policies to counter recruitment into OCGs. These scenarios are based on two different dimensions of prevention and intervention: (i) primary and secondary socialization and (ii) law enforcement targeting strategies.

更新日期：2020-01-13
• arXiv.cs.MA Pub Date : 2018-01-04
Martin Lackner; Piotr Skowron

To choose a suitable multi-winner voting rule is a hard and ambiguous task. Depending on the context, it varies widely what constitutes the choice of an "optimal" subset of alternatives. In this paper, we offer a new perspective on measuring the quality of such subsets and---consequently---of multi-winner rules. We provide a quantitative analysis using methods from the theory of approximation algorithms and estimate how well multi-winner rules approximate two extreme objectives: a representation criterion defined via the Approval Chamberlin--Courant rule and a utilitarian criterion defined via Multi-winner Approval Voting. With both theoretical and experimental methods we classify multi-winner rules in terms of their quantitative alignment with these two opposing objectives. Our results provide fundamental information about the nature of multi-winner rules, and in particular about the necessary tradeoffs when choosing such a rule.

更新日期：2020-01-13
• arXiv.cs.MA Pub Date : 2019-09-21
Mark Rowland; Shayegan Omidshafiei; Karl Tuyls; Julien Perolat; Michal Valko; Georgios Piliouras; Remi Munos

This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called {\alpha}-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or {\alpha}-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.

更新日期：2020-01-13
• arXiv.cs.MA Pub Date : 2020-01-09
Mirko SalarisPolitecnico di Milano; Alessandro RivaPolitecnico di Milano; Francesco AmigoniPolitecnico di Milano

Multirobot systems for covering environments are increasingly used in applications like cleaning, industrial inspection, patrolling, and precision agriculture. The problem of covering a given environment using multiple robots can be naturally formulated and studied as a multi-Traveling Salesperson Problem (mTSP). In a mTSP, the environment is represented as a graph and the goal is to find tours (starting and ending at the same depot) for the robots in order to visit all the vertices with minimum global cost, namely the length of the longest tour. The mTSP is an NP-hard problem for which several approximation algorithms have been proposed. These algorithms usually assume generic environments, but tighter approximation bounds can be reached focusing on specific environments. In this paper, we address the case of environments composed of sub-parts, called modules, that can be reached from each other only through some linking structures. Examples are multi-floor buildings, in which the modules are the floors and the linking structures are the staircases or the elevators, and floors of large hotels or hospitals, in which the modules are the rooms and the linking structures are the corridors. We focus on linear modular environments, with the modules organized sequentially, presenting an efficient (with polynomial worst-case time complexity) algorithm that finds a solution for the mTSP whose cost is within a bounded distance from the cost of the optimal solution. The main idea of our algorithm is to allocate disjoint "blocks" of adjacent modules to the robots, in such a way that each module is covered by only one robot. We experimentally compare our algorithm against some state-of-the-art algorithms for solving mTSPs in generic environments and show that it is able to provide solutions with lower makespan and spending a computing time several orders of magnitude shorter.

更新日期：2020-01-10
• arXiv.cs.MA Pub Date : 2020-01-08
Marek Laskowski; Michael Zargham; Hjalmar Turesson; Henry M. Kim; Matt Barlin; Danil Kabanov; Eden Dhaliwal

We present a methodology for evidence based design of cryptoeconomic systems, and elucidate a real-world example of how this methodology was used in the design of a blockchain network. This work provides a rare insight into the application of Data Science and Stochastic Simulation and Modelling to Token Engineering. We demonstrate how the described process has the ability to uncover previously unexpected system level behaviors. Furthermore, it is observed that the process itself creates opportunities for the discovery of new knowledge and business understanding while developing the system from a high level specification to one precise enough to be executed as a computational model. Discovery of performance issues during design time can spare costly emergency interventions that would be necessary if issues instead became apparent in a production network. For this reason, network designers are increasingly adopting evidence-based design practices, such as the one described herein.

更新日期：2020-01-10
• arXiv.cs.MA Pub Date : 2020-01-09
Vijay K. Garg

Let $L$ be any finite distributive lattice and $B$ be any boolean predicate defined on $L$ such that the set of elements satisfying $B$ is a sublattice of $L$. Consider any subset $M$ of $L$ of size $k$ of elements of $L$ that satisfy $B$. Then, we show that $k$ generalized median elements generated from $M$ also satisfy $B$. We call this result generalized median theorem on finite distributive lattices. When this result is applied to the stable matching, we get Teo and Sethuraman's median stable matching theorem. Our proof is much simpler than that of Teo and Sethuraman. When the generalized median theorem is applied to the assignment problem, we get an analogous result for market clearing price vectors.

更新日期：2020-01-10
• arXiv.cs.MA Pub Date : 2020-01-08
Stefano Mariani; Giacomo Cabri; Franco Zambonelli

In the near future, our streets will be populated by myriads of autonomous self-driving vehicles to serve our diverse mobility needs. This will raise the need to coordinate their movements in order to properly handle both access to shared resources (e.g., intersections and parking slots) and the execution of mobility tasks (e.g., platooning and ramp merging). In this paper, we firstly introduce the general issues associated to coordination of autonomous vehicles, by identifying and framing the key classes of coordination problems. Following, we overview the different approaches that can be adopted to manage such coordination problems, by classifying them in terms of the degree of autonomy in decision making that is left to autonomous vehicles during coordination. Finally, we overview some further peculiar challenges that research will have to address before autonomously coordinated vehicles can safely hit our streets.

更新日期：2020-01-09
• arXiv.cs.MA Pub Date : 2020-01-07
Supratik Mukhopadhyay; Qun Liu; Edward Collier; Yimin Zhu; Ravindra Gudishala; Chanachok Chokwitthaya; Robert DiBiano; Alimire Nabijiang; Sanaz Saeidi; Subhajit Sidhanta; Arnab Ganguly

Recently, it has been widely accepted by the research community that interactions between humans and cyber-physical infrastructures have played a significant role in determining the performance of the latter. The existing paradigm for designing cyber-physical systems for optimal performance focuses on developing models based on historical data. The impacts of context factors driving human system interaction are challenging and are difficult to capture and replicate in existing design models. As a result, many existing models do not or only partially address those context factors of a new design owing to the lack of capabilities to capture the context factors. This limitation in many existing models often causes performance gaps between predicted and measured results. We envision a new design environment, a cyber-physical human system (CPHS) where decision-making processes for physical infrastructures under design are intelligently connected to distributed resources over cyberinfrastructure such as experiments on design features and empirical evidence from operations of existing instances. The framework combines existing design models with context-aware design-specific data involving human-infrastructure interactions in new designs, using a machine learning approach to create augmented design models with improved predictive powers.

更新日期：2020-01-08
• arXiv.cs.MA Pub Date : 2020-01-07
Roula Nassif; Stefan Vlaski; Cedric Richard; Jie Chen; Ali H. Sayed

更新日期：2020-01-08
• arXiv.cs.MA Pub Date : 2020-01-04
Weiya Ren

In this paper, we consider the problem of large scale multi agent reinforcement learning. Firstly, we studied the representation problem of the pairwise value function to reduce the complexity of the interactions among agents. Secondly, we adopt a l2-norm trick to ensure the trivial term of the approximated value function is bounded. Thirdly, experimental results on battle game demonstrate the effectiveness of the proposed approach.

更新日期：2020-01-07
• arXiv.cs.MA Pub Date : 2019-05-06
Aris Filos-Ratsikas; Evi Micha; Alexandros A. Voudouris

Voting can abstractly model any decision-making scenario and as such it has been extensively studied over the decades. Recently, the related literature has focused on quantifying the impact of utilizing only limited information in the voting process on the societal welfare for the outcome, by bounding the distortion of voting rules. Even though there has been significant progress towards this goal, all previous works have so far neglected the fact that in many scenarios (like presidential elections) voting is actually a distributed procedure. In this paper, we consider a setting in which the voters are partitioned into disjoint districts and vote locally therein to elect local winning alternatives using a voting rule; the final outcome is then chosen from the set of these alternatives. We prove tight bounds on the distortion of well-known voting rules for such distributed elections both from a worst-case perspective as well as from a best-case one. Our results indicate that the partition of voters into districts leads to considerably higher distortion, a phenomenon which we also experimentally showcase using real-world data.

更新日期：2020-01-07
• arXiv.cs.MA Pub Date : 2019-07-18
Georgios Amanatidis; Georgios Birmpas; Aris Filos-Ratsikas; Alexandros A. Voudouris

Aggregating the preferences of individuals into a collective decision is the core subject of study of social choice theory. In 2006, Procaccia and Rosenschein considered a utilitarian social choice setting, where the agents have explicit numerical values for the alternatives, yet they only report their linear orderings over them. To compare different aggregation mechanisms, Procaccia and Rosenschein introduced the notion of distortion, which quantifies the inefficiency of using only ordinal information when trying to maximize the social welfare, i.e., the sum of the underlying values of the agents for the chosen outcome. Since then, this research area has flourished and bounds on the distortion have been obtained for a wide variety of fundamental scenarios. However, the vast majority of the existing literature is focused on the case where nothing is known beyond the ordinal preferences of the agents over the alternatives. In this paper, we take a more expressive approach, and consider mechanisms that are allowed to further ask a few cardinal queries in order to gain partial access to the underlying values that the agents have for the alternatives. With this extra power, we design new deterministic mechanisms that achieve significantly improved distortion bounds and, in many cases, outperform the best-known randomized ordinal mechanisms. We paint an almost complete picture of the number of queries required to achieve specific distortion bounds.

更新日期：2020-01-07
• arXiv.cs.MA Pub Date : 2019-11-28
Jhelum Chakravorty; Nadeem Ward; Julien Roy; Maxime Chevalier-Boisvert; Sumana Basu; Andrei Lupu; Doina Precup

In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems using the options framework (Sutton et al, 1999) and provide a model-free algorithm for this problem. First, we address the planning problem for the decentralized POMDP represented by the multi-agent system, by introducing a common information approach. We use common beliefs and broadcasting to solve an equivalent centralized POMDP problem. Then, we propose the Distributed Option Critic (DOC) algorithm, motivated by the work of Bacon et al (2017) in the single-agent setting. Our approach uses centralized option evaluation and decentralized intra-option improvement. We analyze theoretically the asymptotic convergence of DOC and validate its performance in grid-world environments, where we implement DOC using a deep neural network. Our experiments show that DOC performs competitively with state-of-the-art algorithms and that it is scalable when the number of agents increases.

更新日期：2020-01-07
• arXiv.cs.MA Pub Date : 2019-12-18
Leonit Zeynalvand; Tie Luo; Jie Zhang

Trust and reputation management (TRM) plays an increasingly important role in large-scale online environments such as multi-agent systems (MAS) and the Internet of Things (IoT). One main objective of TRM is to achieve accurate trust assessment of entities such as agents or IoT service providers. However, this encounters an accuracy-privacy dilemma as we identify in this paper, and we propose a framework called Context-aware Bernoulli Neural Network based Reputation Assessment (COBRA) to address this challenge. COBRA encapsulates agent interactions or transactions, which are prone to privacy leak, in machine learning models, and aggregates multiple such models using a Bernoulli neural network to predict a trust score for an agent. COBRA preserves agent privacy and retains interaction contexts via the machine learning models, and achieves more accurate trust prediction than a fully-connected neural network alternative. COBRA is also robust to security attacks by agents who inject fake machine learning models; notably, it is resistant to the 51-percent attack. The performance of COBRA is validated by our experiments using a real dataset, and by our simulations, where we also show that COBRA outperforms other state-of-the-art TRM systems.

更新日期：2020-01-07
• arXiv.cs.MA Pub Date : 2020-01-03
Alessandro Paolo Capasso; Giulio Bacchiani; Daniele Molinari

An important topic in the autonomous driving research is the development of maneuver planning systems. Vehicles have to interact and negotiate with each other so that optimal choices, in terms of time and safety, are taken. For this purpose, we present a maneuver planning module able to negotiate the entering in busy roundabouts. The proposed module is based on a neural network trained to predict when and how entering the roundabout throughout the whole duration of the maneuver. Our model is trained with a novel implementation of A3C, which we will call Delayed A3C (D-A3C), in a synthetic environment where vehicles move in a realistic manner with interaction capabilities. In addition, the system is trained such that agents feature a unique tunable behavior, emulating real world scenarios where drivers have their own driving styles. Similarly, the maneuver can be performed using different aggressiveness levels, which is particularly useful to manage busy scenarios where conservative rule-based policies would result in undefined waits.

更新日期：2020-01-06
• arXiv.cs.MA Pub Date : 2020-01-03
Korosh Mahmoodi; Bruce J. West; Cleotilde Gonzalez

We propose a model for demonstrating spontaneous emergence of collective intelligent behavior from selfish individual agents. Agents' behavior is modeled using our proposed selfish algorithm ($SA$) with three learning mechanisms: reinforced learning ($SAL$), trust ($SAT$) and connection ($SAC$). Each of these mechanisms provides a distinctly different way an agent can increase the individual benefit accrued through playing the prisoner's dilemma game ($PDG$) with other agents. The $SA$ provides a generalization of the self-organized temporal criticality ($SOTC$) model and shows that self-interested individuals can simultaneously produce maximum social benefit from their decisions. The mechanisms in the $SA$ are self-tuned by the internal dynamics and without having a pre-established network structure. Our results demonstrate emergence of mutual cooperation, emergence of dynamic networks, and adaptation and resilience of social systems after perturbations. The implications and applications of the $SA$ are discussed.

更新日期：2020-01-06
• arXiv.cs.MA Pub Date : 2019-12-14
Wenhang Bao

Unfair stock trading strategies have been shown to be one of the most negative perceptions that customers can have concerning trading and may result in long-term losses for a company. Investment banks usually place trading orders for multiple clients with the same target assets but different order sizes and diverse requirements such as time frame and risk aversion level, thereby total earning and individual earning cannot be optimized at the same time. Orders executed earlier would affect the market price level, so late execution usually means additional implementation cost. In this paper, we propose a novel scheme that utilizes multi-agent reinforcement learning systems to derive stock trading strategies for all clients which keep a balance between revenue and fairness. First, we demonstrate that Reinforcement learning (RL) is able to learn from experience and adapt the trading strategies to the complex market environment. Secondly, we show that the Multi-agent RL system allows developing trading strategies for all clients individually, thus optimizing individual revenue. Thirdly, we use the Generalized Gini Index (GGI) aggregation function to control the fairness level of the revenue across all clients. Lastly, we empirically demonstrate the superiority of the novel scheme in improving fairness meanwhile maintaining optimization of revenue.

更新日期：2020-01-06
• arXiv.cs.MA Pub Date : 2019-05-15
Anthony P. Young; David Kohan Marzagao; Josh Murphy

We apply ideas from abstract argumentation theory to study cooperative game theory. Building on Dung's results in his seminal paper, we further the correspondence between Dung's four argumentation semantics and solution concepts in cooperative game theory by showing that complete extensions (the grounded extension) correspond to Roth's subsolutions (respectively, the supercore). We then investigate the relationship between well-founded argumentation frameworks and convex games, where in each case the semantics (respectively, solution concepts) coincide; we prove that three-player convex games do not in general have well-founded argumentation frameworks.

更新日期：2020-01-06
• arXiv.cs.MA Pub Date : 2019-12-31

We propose a controlled simulation within a competitive sum-zero environment as a proxy for disaggregating components of success. Given a simulation of the Risk board game, we consider (a) talent to be one of three rule-based strategies; (b) context as the setting of each run with opponents' strategies, goals and luck; and (c) perspective as the objective of each player. Success is attained when a first player conquers its goal. We simulate 100,000 runs of an agent-based model and analyze the results. The simulation results strongly suggest that luck, talent and context are all relevant to determine success. Perspective -- as the description of the goal that defines success -- is not. As such, we present a quantitative, reproducible environment in which we are able to significantly separate the concepts, reproducing previous results and adding arguments for context and perspective. Finally, we also find that resilience and opportunity might be examined within the simulation provided.

更新日期：2020-01-04
• arXiv.cs.MA Pub Date : 2020-01-01

In this brief paper, a new consensus protocol based on the sign of innovations is proposed. Based on this protocol each agent only requires single-bit of information about its relative state to its neighboring agents. This is significant in real-time applications, since it requires less computation and/or communication load on agents. Using Lyapunov stability theorem the convergence is proved for networks having a spanning tree. Further, the convergence is shown to be in finite-time, which is significant as compared to most asymptotic protocols in the literature. Time-variant network topologies are also considered in this paper, and final consensus value is derived for undirected networks. Applications of the proposed consensus protocol in (i) 2D/3D rendezvous task, (ii) distributed estimation, (iii) distributed optimization, and (iv) formation control are considered and significance of applying this protocol is discussed. Numerical simulations are provided to compare the protocol with the existing protocols in the literature.

更新日期：2020-01-04
• arXiv.cs.MA Pub Date : 2020-01-02
Yunus Emre Sahin; Necmiye Ozay

In this paper, we consider the multi-robot path execution problem where a group of robots move on predefined paths from their initial to target positions while avoiding collisions and deadlocks in the face of asynchrony. We first show that this problem can be reformulated as a distributed resource allocation problem and, in particular, as an instance of the well-known Drinking Philosophers Problem (DrPP). By careful construction of the drinking sessions capturing shared resources, we show that any existing solutions to DrPP can be used to design robot control policies that are collectively collision and deadlock-free. We then propose modifications to an existing DrPP algorithm to allow more concurrent behavior, and provide conditions under which our method is deadlock-free. Our method do not require robots to know or to estimate the speed profiles of other robots, and results in distributed control policies. We demonstrate the efficacy of our method on simulation examples, which show competitive performance against the state-of-the-art.

更新日期：2020-01-04
• arXiv.cs.MA Pub Date : 2020-01-02
S. Rasoul Etesami; Negar Kiyavash; H. Vincent Poor

We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of $1+O(\sqrt{\frac{\ln N}{N}})$, where $N$ is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N^2)$. Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system.

更新日期：2020-01-04
• arXiv.cs.MA Pub Date : 2020-01-02
Ahmed A. Hussein; Hesham A. Rakha

In this paper, empirical data from the literature are used to develop general power models that capture the impact of a vehicle position, in a platoon of homogeneous vehicles, and the distance gap to its lead (and following) vehicle on its drag coefficient. These models are developed for light duty vehicles, buses, and heavy duty trucks. The models were fit using a constrained optimization framework to fit a general power function using either direct drag force or fuel measurements. The model is then used to extrapolate the empirical measurements to a wide range of vehicle distance gaps within a platoon. Using these models we estimate the potential fuel reduction associated with homogeneous platoons of light duty vehicles, buses, and heavy duty trucks. The results show a significant reduction in the vehicle fuel consumption when compared with those based on a constant drag coefficient assumption. Specifically, considering a minimum time gap between vehicles of $0.5 \; secs$ (which is typical considering state-of-practice communication and mechanical system latencies) running at a speed of $100 \; km/hr$, the optimum fuel reduction that is achieved is $4.5 \%$, $15.5 \%$, and $7.0 \%$ for light duty vehicle, bus, and heavy duty truck platoons, respectively. For longer time gaps, the bus and heavy duty truck platoons still produce fuel reductions in the order of $9.0 \%$ and $4.5 \%$, whereas light duty vehicles produce negligible fuel savings.

更新日期：2020-01-04
• arXiv.cs.MA Pub Date : 2020-01-02
Faheem Zafari; Kin K. Leung; Don Towsley; Prithwish Basu; Ananthram Swami; Jian Li

Mobile edge computing seeks to provide resources to different delay-sensitive applications. This is a challenging problem as an edge cloud-service provider may not have sufficient resources to satisfy all resource requests. Furthermore, allocating available resources optimally to different applications is also challenging. Resource sharing among different edge cloud-service providers can address the aforementioned limitation as certain service providers may have resources available that can be `rented'' by other service providers. However, edge cloud service providers can have different objectives or \emph{utilities}. Therefore, there is a need for an efficient and effective mechanism to share resources among service providers, while considering the different objectives of various providers. We model resource sharing as a multi-objective optimization problem and present a solution framework based on \emph{Cooperative Game Theory} (CGT). We consider the strategy where each service provider allocates resources to its native applications first and shares the remaining resources with applications from other service providers. We prove that for a monotonic, non-decreasing utility function, the game is canonical and convex. Hence, the \emph{core} is not empty and the grand coalition is stable. We propose two algorithms \emph{Game-theoretic Pareto optimal allocation} (GPOA) and \emph{Polyandrous-Polygamous Matching based Pareto Optimal Allocation} (PPMPOA) that provide allocations from the core. Hence the obtained allocations are \emph{Pareto} optimal and the grand coalition of all the service providers is stable. Experimental results confirm that our proposed resource sharing framework improves utilities of edge cloud-service providers and application request satisfaction.

更新日期：2020-01-04
• arXiv.cs.MA Pub Date : 2018-02-16
Ismael T. Freire; Clement Moulin-Frier; Marti Sanchez-Fibla; Xerxes D. Arsiwalla; Paul Verschure

What is the role of real-time control and learning in the formation of social conventions? To answer this question, we propose a computational model that matches human behavioral data in a social decision-making game that was analyzed both in discrete-time and continuous-time setups. Furthermore, unlike previous approaches, our model takes into account the role of sensorimotor control loops in embodied decision-making scenarios. For this purpose, we introduce the Control-based Reinforcement Learning (CRL) model. CRL is grounded in the Distributed Adaptive Control (DAC) theory of mind and brain, where low-level sensorimotor control is modulated through perceptual and behavioral learning in a layered structure. CRL follows these principles by implementing a feedback control loop handling the agent's reactive behaviors (pre-wired reflexes), along with an adaptive layer that uses reinforcement learning to maximize long-term reward. We test our model in a multi-agent game-theoretic task in which coordination must be achieved to find an optimal solution. We show that CRL is able to reach human-level performance on standard game-theoretic metrics such as efficiency in acquiring rewards and fairness in reward distribution.

更新日期：2020-01-04
Contents have been reproduced by permission of the publishers.

down
wechat
bug