当前期刊: arXiv - CS - Computer Science and Game Theory Go to current issue    加入关注   
显示样式:        排序: 导出
  • Signalling Acts of Punishment Promotes the Emergence of Cooperation and Enhanced Social Welfare in Evolutionary Games
    arXiv.cs.GT Pub Date : 2020-01-22
    Theodor Cimpeanu; The Anh Han

    Social punishment has been suggested as a key approach to ensuring high levels of cooperation and norm compliance in one-shot (i.e. non-repeated) interactions. However, it has been shown that it only works when punishment is highly cost-efficient. On the other hand, signalling retribution hearkens back to medieval sovereignty, insofar as the very word for gallows in French stems from the Latin word for power and serves as a grim symbol of the ruthlessness of high justice. Here we introduce the mechanism of signalling an act of punishment and a special type of defector emerges, one who can recognise this signal and avoid punishment by way of fear. We describe the analytical conditions under which threat signalling can maintain high levels of cooperation. Moreover, we perform extensive agent-based simulations so as to confirm and expand our understanding of the external factors that influence the success of social punishment. We show that our suggested mechanism serves as a catalyst for cooperation, even when signalling is costly or when punishment would be impractical. We observe the preventive nature of advertising retributive acts and we contend that the resulting social prosperity is a desirable outcome in the contexts of AI and multi-agent systems. To conclude, we argue that fear acts as an effective stimulus to pro-social behaviour.

  • A Coalition Formation Game Framework for Peer-to-Peer Energy Trading
    arXiv.cs.GT Pub Date : 2019-12-24
    Wayes Tushar; Tapan K. Saha; Chau Yuen; M. Imran Azim; Thomas Morstyn; H. Vincent Poor; Dustin Niyato; Richard Bean

    This paper studies social cooperation backed peer-to-peer energy trading technique by which prosumers can decide how they can use their batteries opportunistically for participating in the peer-to-peer trading. The objective is to achieve a solution in which the ultimate beneficiaries are the prosumers, i.e., a prosumer-centric solution. To do so, a coalition formation game is designed, which enables a prosumer to compare its benefit of participating in the peer-to-peer trading with and without using its battery and thus, allows the prosumer to form suitable social coalition groups with other similar prosumers in the network for conducting peer-to-peer trading. The properties of the formed coalitions are studied, and it is shown that 1) the coalition structure that stems from the social cooperation between participating prosumers at each time slot is both stable and optimal, and 2) the outcomes of the proposed peer- to-peer trading scheme is prosumer-centric. Case studies are conducted based on real household energy usage and solar generation data to highlight how the proposed scheme can benefit prosumers through exhibiting prosumer-centric properties.

  • Geometrical Regret Matching
    arXiv.cs.GT Pub Date : 2019-08-18
    Sizhong Lan

    We argue that the existing regret matchings for Nash equilibrium approximation conduct "jumpy" strategy updating when the probabilities of future plays are set to be proportional to positive regret measures. We propose a geometrical regret matching which features "smooth" strategy updating. Our approach is simple, intuitive and natural. The analytical and numerical results show that, continuously and "smoothly" suppressing "unprofitable" pure strategies is sufficient for the game to evolve towards Nash equilibrium, suggesting that in reality the tendency for equilibrium could be pervasive and irresistible. Technically, iterative regret matching gives rise to a sequence of adjusted mixed strategies for our study its approximation to the true equilibrium point. The sequence can be studied in metric space and visualized nicely as a clear path towards an equilibrium point. Our theory has limitations in optimizing the approximation accuracy.

  • A Price-Per-Attention Auction Scheme Using Mouse Cursor Information
    arXiv.cs.GT Pub Date : 2020-01-21
    Ioannis Arapakis; Antonio Penta; Hideo Joho; Luis A. Leiva

    Payments in online ad auctions are typically derived from click-through rates, so that advertisers do not pay for ineffective ads. But advertisers often care about more than just clicks. That is, for example, if they aim to raise brand awareness or visibility. There is thus an opportunity to devise a more effective ad pricing paradigm, in which ads are paid only if they are actually noticed. This article contributes a novel auction format based on a pay-per-attention (PPA) scheme. We show that the PPA auction inherits the desirable properties (strategy-proofness and efficiency) as its pay-per-impression and pay-per-click counterparts, and that it also compares favourably in terms of revenues. To make the PPA format feasible, we also contribute a scalable diagnostic technology to predict user attention to ads in sponsored search using raw mouse cursor coordinates only, regardless of the page content and structure. We use the user attention predictions in numerical simulations to evaluate the PPA auction scheme. Our results show that, in relevant economic settings, the PPA revenues would be strictly higher than the existing auction payment schemes.

  • Bounding Fixed Points of Set-Based Bellman Operator and Nash Equilibria of Stochastic Games
    arXiv.cs.GT Pub Date : 2020-01-22
    Sarah H. Q. Li; Assalé; Adjé; Pierre-Loïc Garoche; Behçet Açıkmeşe

    Motivated by uncertain parameters encountered in Markov decision processes (MDPs) and stochastic games, we study the effect of parameter uncertainty on Bellman operator-based algorithms under a set-based framework. Specifically, we first consider a family of MDPs where the cost parameters are in a given compact set; we then define a Bellman operator acting on a set of value functions to produce a new set of value functions as the output under all possible variations in the cost parameter. We prove the existence of a fixed point of this set-based Bellman operator by showing that it is contractive on a complete metric space, and explore its relationship with the corresponding family of MDPs and stochastic games. Additionally, we show that given interval set bounded cost parameters, we can form exact bounds on the set of optimal value functions. Finally, we utilize our results to bound the value function trajectory of a player in a stochastic game.

  • A utility-based analysis of equilibria in multi-objective normal form games
    arXiv.cs.GT Pub Date : 2020-01-17
    Roxana Rădulescu; Patrick Mannion; Yijie Zhang; Diederik M. Roijers; Ann Nowé

    In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their payoff vectors to scalar utility values. This utility-based approach naturally leads to two different optimisation criteria for agents in a MOMAS: expected scalarised returns (ESR) and scalarised expected returns (SER). In this article, we explore the differences between these two criteria using the framework of multi-objective normal form games (MONFGs). We demonstrate that the choice of optimisation criterion (ESR or SER) can radically alter the set of equilibria in a MONFG when non-linear utility functions are used.

  • Complexity, Stability Properties of Mixed Games and Dynamic Algorithms, and Learning in the Sharing Economy
    arXiv.cs.GT Pub Date : 2020-01-18
    Michael C. Nwogugu

    The Sharing Economy (which includes Airbnb, Apple, Alibaba, Uber, WeWork, Ebay, Didi Chuxing, Amazon) blossomed across the world, triggered structural changes in industries and significantly affected international capital flows primarily by disobeying a wide variety of statutes and laws in many countries. They also illegally reduced and changing the nature of competition in many industries often to the detriment of social welfare. This article develops new dynamic pricing models for the SEOs and derives some stability properties of mixed games and dynamic algorithms which eliminate antitrust liability and also reduce deadweight losses, greed, Regret and GPS manipulation. The new dynamic pricing models contravene the Myerson Satterthwaite Impossibility Theorem.

  • Tight Revenue Gaps among Simple Mechanisms
    arXiv.cs.GT Pub Date : 2018-04-02
    Yaonan Jin; Pinyan Lu; Zhihao Gavin Tang; Tao Xiao

    We consider a fundamental problem in microeconomics: selling a single item to a number of potential buyers, whose values are drawn from known independent and regular (not necessarily identical) distributions. There are four widely-used and widely-studied mechanisms in the literature: {\sf Myerson Auction}~({\sf OPT}), {\sf Sequential Posted-Pricing}~({\sf SPM}), {\sf Second-Price Auction with Anonymous Reserve}~({\sf AR}), and {\sf Anonymous Pricing}~({\sf AP}). {\sf OPT} is revenue-optimal but complicated, which also experiences several issues in practice such as fairness; {\sf AP} is the simplest mechanism, but also generates the lowest revenue among these four mechanisms; {\sf SPM} and {\sf AR} are of intermediate complexity and revenue. We explore revenue gaps among these mechanisms, each of which is defined as the largest ratio between revenues from a pair of mechanisms. We establish two tight bounds and one improved bound: 1. {\sf SPM} vs.\ {\sf AP}: this ratio studies the power of discrimination in pricing schemes. We obtain the tight ratio of $\mathcal{C^*} \approx 2.62$, closing the gap between $\big[\frac{e}{e - 1}, e\big]$ left before. 2. {\sf AR} vs.\ {\sf AP}: this ratio measures the relative power of auction scheme vs.\ pricing scheme, when no discrimination is allowed. We attain the tight ratio of $\frac{\pi^2}{6} \approx 1.64$, closing the previously known bounds $\big[\frac{e}{e - 1}, e\big]$. 3. {\sf OPT} vs.\ {\sf AR}: this ratio quantifies the power of discrimination in auction schemes, and is previously known to be somewhere between $\big[2, e\big]$. The lower-bound of $2$ was conjectured to be tight by Hartline and Roughgarden (2009) and Alaei et al.\ (2015). We acquire a better lower-bound of $2.15$, and thus disprove this conjecture.

  • HEB: Hybrid Expenditure Blockchain
    arXiv.cs.GT Pub Date : 2019-11-11
    Itay Tsabary; Alexander Spiegelman; Ittay Eyal

    The study of Proof of Work (PoW) has culminated with the introduction of cryptocurrency blockchains like Bitcoin. These protocols require their operators, called miners, to expend computational resources and they reward them with minted cryptocurrency tokens. The system is secure from attackers who cannot expend resources at a rate equivalent to that of all benign miners. But the resource requirement is arbitrary - the product of the number of minted tokens and their real value. We present Hybrid Expenditure Blockchain (HEB), a novel cryptocurrency PoW protocol that allows its designer to tune external expenditure. To the best of our knowledge, this is the first tunable PoW protocol. Despite the reduced resource expenditure, it maintains the security guarantees of pure PoWprotocols against rational attacks. HEB has practical implications, as global power expenditure on PoW blockchains exceeds that of a medium-sized country. Applying HEB in operational PoW systems can significantly reduce their ecological footprint.

  • Personalized Pareto-Improving Pricing Schemes with Truthfulness Guarantees: An Alternative Approach to Congestion Pricing
    arXiv.cs.GT Pub Date : 2019-12-19
    Aristotelis-Angelos Papadopoulos; Ioannis Kordonis; Maged M. Dessouky; Petros A. Ioannou

    We design a coordination mechanism for truck drivers that uses pricing schemes to alleviate traffic congestion in a general transportation network. We consider the user heterogeneity in Value-Of-Time (VOT) by adopting a multi-class model with stochastic Origin-Destination (OD) demands for the truck drivers. A basic characteristic of the mechanism is that the coordinator asks the truck drivers to declare their desired OD pair, as well as their individual VOT from a set of $N$ available options, and guarantees that the resulting pricing scheme is Pareto-improving, i.e. every truck driver will be better-off compared to the User Equilibrium (UE) and that every truck driver will have an incentive to truthfully declare his/her VOT, while leading to a revenue-neutral (budget balanced) on average mechanism. We show that the Optimum Pricing Scheme (OPS) can be calculated by solving a nonconvex optimization problem. To achieve computational efficiency, we additionally propose an Approximately Optimum Pricing Scheme (AOPS) and we prove that it satisfies the aforementioned characteristics. Both pricing schemes are compared to the Congestion Pricing with Uniform Revenue Refunding (CPURR) scheme through extensive simulation experiments. Initially, we experimentally show for the single OD pair with two routes network, CPURR does not provide a significantly better solution compared to the UE in terms of expected total monetary cost whenever the OD demand is stochastic. For the same network, we also show that the difference in the expected total monetary cost of truck drivers between the OPS and the CPURR solutions becomes higher as the difference between the distinct classes of VOT becomes larger. Finally, the simulation results using the Sioux Falls network demonstrate that both OPS and AOPS consistently outperform CPURR both in expected total travel time and in expected total monetary cost.

  • Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory
    arXiv.cs.GT Pub Date : 2020-01-17
    Yunlong Lu; Kai Yan

    Deep reinforcement learning (RL) has achieved outstanding results in recent years, which has led a dramatic increase in the number of methods and applications. Recent works are exploring learning beyond single-agent scenarios and considering multi-agent scenarios. However, they are faced with lots of challenges and are seeking for help from traditional game-theoretic algorithms, which, in turn, show bright application promise combined with modern algorithms and boosting computing power. In this survey, we first introduce basic concepts and algorithms in single agent RL and multi-agent systems; then, we summarize the related algorithms from three aspects. Solution concepts from game theory give inspiration to algorithms which try to evaluate the agents or find better solutions in multi-agent systems. Fictitious self-play becomes popular and has a great impact on the algorithm of multi-agent reinforcement learning. Counterfactual regret minimization is an important tool to solve games with incomplete information, and has shown great strength when combined with deep learning.

  • Scalable Bid Landscape Forecasting in Real-time Bidding
    arXiv.cs.GT Pub Date : 2020-01-18
    Aritra Ghosh; Saayan Mitra; Somdeb Sarkhel; Jason Xie; Gang Wu; Viswanathan Swaminathan

    In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated. We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network. Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of the largest demand-side platforms and significant improvement has been achieved in comparison with the baseline solutions.

  • Predict and Match: Prophet Inequalities with Uncertain Supply
    arXiv.cs.GT Pub Date : 2020-01-19
    Reza Alijani; Siddhartha Banerjee; Sreenivas Gollapudi; Kamesh Munagala; Kangning Wang

    We consider the problem of selling perishable items to a stream of buyers in order to maximize social welfare. A seller starts with a set of identical items, and each arriving buyer wants any one item, and has a valuation drawn i.i.d. from a known distribution. Each item, however, disappears after an a priori unknown amount of time that we term the horizon for that item. The seller knows the (possibly different) distribution of the horizon for each item, but not its realization till the item actually disappears. As with the classic prophet inequalities, the goal is to design an online pricing scheme that competes with the prophet that knows the horizon and extracts full social surplus (or welfare). Our main results are for the setting where items have independent horizon distributions satisfying the monotone-hazard-rate (MHR) condition. Here, for any number of items, we achieve a constant-competitive bound via a conceptually simple policy that balances the rate at which buyers are accepted with the rate at which items are removed from the system. We implement this policy via a novel technique of matching via probabilistically simulating departures of the items at future times. Moreover, for a single item and MHR horizon distribution with mean $\mu$, we show a tight result: There is a fixed pricing scheme that has competitive ratio at most $2 - 1/\mu$, and this is the best achievable in this class. We further show that our results are best possible. First, we show that the competitive ratio is unbounded without the MHR assumption even for one item. Further, even when the horizon distributions are i.i.d. MHR and the number of items becomes large, the competitive ratio of any policy is lower bounded by a constant greater than $1$, which is in sharp contrast to the setting with identical deterministic horizons.

  • Incentive-Compatible Diffusion Auctions
    arXiv.cs.GT Pub Date : 2020-01-20
    Bin Li; Dong Hao; Dengji Zhao

    Diffusion auction is a new model in auction design. It can incentivize the buyers who have already joined in the auction to further diffuse the sale information to others via social relations, whereby both the seller's revenue and the social welfare can be improved. Diffusion auctions are essentially non-typical multidimensional mechanism design problems and agents' social relations are complicatedly involved with their bids. In such auctions, incentive-compatibility (IC) means it is best for every agent to honestly report her valuation and fully diffuse the sale information to all her neighbors. Existing work identified some specific mechanisms for diffusion auctions, while a general theory characterizing all incentive-compatible diffusion auctions is still missing. In this work, we identify a sufficient and necessary condition for all dominant-strategy incentive-compatible (DSIC) diffusion auctions. We formulate the monotonic allocation policies in such multidimensional problems and show that any monotonic allocation policy can be implemented in a DSIC diffusion auction mechanism. Moreover, given any monotonic allocation policy, we obtain the optimal payment policy to maximize the seller's revenue.

  • Strategy-Proof Spectrum Allocation among Multiple Operators for Demand Varying Wireless Networks
    arXiv.cs.GT Pub Date : 2020-01-20
    Indu Yadav; Ankur A. Kulkarni; Abhay Karandikar

    To address the exponentially increasing data rate demands of end users, necessitates efficient spectrum allocation among co-existing operators in licensed and unlicensed spectrum bands to cater to the temporal and spatial variations of traffic in the wireless network. In this paper, we address the spectrum allocation problem among non-cooperative operators via auctions. The classical Vickrey-Clarke-Groves (VCG) approach provides the framework for a strategy-proof and social welfare maximizing auction at high computational complexity, which makes it infeasible for practical implementation. We propose sealed bid auction mechanisms for spectrum allocation which are computationally tractable and hence applicable for allocating spectrum by performing auctions in short durations as per the dynamic load variations of the network. We establish that the proposed algorithm is strategy-proof for uniform demand. Furthermore, for non-uniform demand we propose an algorithm that satisfies weak strategy-proofness. We also consider non-linear increase in the marginal valuations with demand. Simulation results are presented to exhibit the performance comparison of the proposed algorithms with VCG and other existing mechanisms.

  • Stochastic Control Approach to Reputation Games
    arXiv.cs.GT Pub Date : 2016-04-01
    Nuh Aygün Dalkıran; Serdar Yüksel

    Through a stochastic control theoretic approach, we analyze reputation games where a strategic long-lived player acts in a sequential repeated game against a collection of short-lived players. The key assumption in our model is that the information of the short-lived players is nested in that of the long-lived player. This nested information structure is obtained through an appropriate monitoring structure. Under this monitoring structure, we show that, given mild assumptions, the set of Perfect Bayesian Equilibrium payoffs coincide with Markov Perfect Equilibrium payoffs, and hence a dynamic programming formulation can be obtained for the computation of equilibrium strategies of the strategic long-lived player in the discounted setup. We also consider the undiscounted average-payoff setup where we obtain an optimal equilibrium strategy of the strategic long-lived player under further technical conditions. We then use this optimal strategy in the undiscounted setup as a tool to obtain a tight upper payoff bound for the arbitrarily patient long-lived player in the discounted setup. Finally, by using measure concentration techniques, we obtain a refined lower payoff bound on the value of reputation in the discounted setup. We also study the continuity of equilibrium payoffs in the prior beliefs.

  • Learning from Neighbors about a Changing State
    arXiv.cs.GT Pub Date : 2018-01-06
    Krishna Dasaratha; Benjamin Golub; Nir Hak

    Agents learn about a changing state using private signals and past actions of neighbors in a network. We characterize equilibrium learning and social influence in this setting. We then examine when agents can aggregate information well, responding quickly to recent changes. A key sufficient condition for good aggregation is that each individual's neighbors have sufficiently different types of private information. In contrast, when signals are homogeneous, aggregation is suboptimal on any network. We also examine behavioral versions of the model, and show that achieving good aggregation requires a sophisticated understanding of correlations in neighbors' actions. The model provides a Bayesian foundation for a tractable learning dynamic in networks, closely related to the DeGroot model, and offers new tools for counterfactual and welfare analyses.

  • A Truthful Cardinal Mechanism for One-Sided Matching
    arXiv.cs.GT Pub Date : 2019-03-19
    Rediet Abebe; Richard Cole; Vasilis Gkatzelis; Jason D. Hartline

    We revisit the well-studied problem of designing mechanisms for one-sided matching markets, where a set of $n$ agents needs to be matched to a set of $n$ heterogeneous items. Each agent $i$ has a value $v_{i,j}$ for each item $j$, and these values are private information that the agents may misreport if doing so leads to a preferred outcome. Ensuring that the agents have no incentive to misreport requires a careful design of the matching mechanism, and mechanisms proposed in the literature mitigate this issue by eliciting only the \emph{ordinal} preferences of the agents, i.e., their ranking of the items from most to least preferred. However, the efficiency guarantees of these mechanisms are based only on weak measures that are oblivious to the underlying values. In this paper we achieve stronger performance guarantees by introducing a mechanism that truthfully elicits the full \emph{cardinal} preferences of the agents, i.e., all of the $v_{i,j}$ values. We evaluate the performance of this mechanism using the much more demanding Nash bargaining solution as a benchmark, and we prove that our mechanism significantly outperforms all ordinal mechanisms (even non-truthful ones). To prove our approximation bounds, we also study the population monotonicity of the Nash bargaining solution in the context of matching markets, providing both upper and lower bounds which are of independent interest.

  • Contiguous Cake Cutting: Hardness Results and Approximation Algorithms
    arXiv.cs.GT Pub Date : 2019-11-13
    Paul W. Goldberg; Alexandros Hollender; Warut Suksompong

    We study the fair allocation of a cake, which serves as a metaphor for a divisible resource, under the requirement that each agent should receive a contiguous piece of the cake. While it is known that no finite envy-free algorithm exists in this setting, we exhibit efficient algorithms that produce allocations with low envy among the agents. We then establish NP-hardness results for various decision problems on the existence of envy-free allocations, such as when we fix the ordering of the agents or constrain the positions of certain cuts. In addition, we consider a discretized setting where indivisible items lie on a line and show a number of hardness results extending and strengthening those from prior work. Finally, we investigate connections between approximate and exact envy-freeness, as well as between continuous and discrete cake cutting.

  • Sequential decomposition of graphon mean field games
    arXiv.cs.GT Pub Date : 2020-01-16
    Deepanshu Vasal; Rajesh K Mishra; Sriram Vishwanath

    In this paper, we present a sequential decomposition algorithm to compute graphon mean field equilibrium (GMFE) of dynamic graphon mean field game (GMFG). We consider a large population of players sequentially making strategic decisions where the actions of each player affect their neighbors which is captured in a graph, generated by a known graphon. Each player observes a private state and also a common information as a graphon mean-field population state which represents the empirical networked distribution of other players' types. We consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute GMFE that depend on both, a player's private type, and the current (dynamic) population state determined through the graphon. Each step in this algorithm consists of solving a fixed-point equation. We provide conditions on model parameters for which there exists such a GMFE. Using this algorithm, we obtain the GMFE for a specific security setup in cyber physical systems for different graphons that capture the interactions between the nodes in the system.

  • A Study of Incentive Compatibility and Stability Issues in Fractional Matchings
    arXiv.cs.GT Pub Date : 2020-01-16
    Shivika Narang; Y Narahari

    Stable matchings have been studied extensively in both economics and computer science literature. However, most of the work considers only integral matchings. The study of stable fractional matchings is fairly recent and moreover, is scarce. This paper reports the first investigation into the important but unexplored topic of incentive compatibility of matching mechanisms to find stable fractional matchings. We focus our attention on matching instances under strict preferences. First, we make the significant observation that there are matching instances for which no mechanism that produces a stable fractional matching is incentive compatible. We then characterize restricted settings of matching instances admitting unique stable fractional matchings. Specifically, we show that there will exist a unique stable fractional matching for a matching instance if and only if the given matching instance satisfies what we call the conditional mutual first preference property (CMFP). For this class of instances, we prove that every mechanism that produces the unique stable fractional matching is (a) incentive compatible and (b) resistant to coalitional manipulations. We provide a polynomial-time algorithm to compute the stable fractional matching as well. The algorithm uses envy-graphs, hitherto unused in the study of stable matchings.

  • Design of Trusted Market Platforms using Permissioned Blockchains and Game Theory
    arXiv.cs.GT Pub Date : 2020-01-16
    Shivika Narang

    The blockchain concept forms the backbone of a new wave technology that promises to be deployed extensively in a wide variety of industrial and societal applications. Governments, financial institutions, banks, industrial supply chains, service companies, and even educational institutions and hospitals are investing in a substantial manner in the hope of improving business efficiency and operational robustness through deployment of blockchain technology. This thesis work is concerned with designing trustworthy business-to-business (B2B) market platforms drawing upon blockchain technology and game theory. The proposed platform is built upon three key ideas. First, we use permissioned blockchains with smart contracts as a technically sound approach for building the B2B platform. The blockchain deploys smart contracts that govern the interactions of enterprise buyers and sellers. Second, the smart contracts are designed using a rigorous analysis of a repeated game model of the strategic interactions between buyers and sellers. We show that such smart contracts induce honest behavior from buyers and sellers. Third, we embed cryptographic regulation protocols into the permissioned blockchain to ensure that business sensitive information is not revealed to the competitors. We believe our work is an important step in the direction of building a powerful B2B platform that maximizes social welfare and enables trusted collaboration between strategic enterprise agents.

  • How Good Is a Two-Party Election Game?
    arXiv.cs.GT Pub Date : 2020-01-16
    Chuang-Chieh Lin; Chi-Jen Lu; Po-An Chen

    In this paper, we propose a simple and intuitive model to investigate the efficiency of the two-party election system, especially regarding the nomination process. Each of the two parties has its own candidates, and each of them brings utilities for the people including the supporters and non-supporters. In an election, each party nominates exactly one of its candidates to compete against the other party's. The candidate wins the election with higher odds if he or she brings more utility for all the people. We model such competition as a "two-party election game" such that each party is a player with two or more pure strategies corresponding to its potential candidates, and the payoff of each party is a mixed utility from a selected pair of competing candidates. By looking into the three models, namely, the linear link, Bradley-Terry, and the softmax models, which differ in how to formulate a candidate's winning odds against the competing candidate, we show that the two-party election game may neither have any pure Nash equilibrium nor a bounded price of anarchy. Nevertheless, by considering the conventional "egoism", which states that any candidate benefits his/her party's supporters more than any candidate from the competing party does, we prove that the two-party election game in both the linear link model and the softmax model always has pure Nash equilibria, and furthermore, the price of anarchy is constantly bounded.

  • A meta analysis of tournaments and an evaluation of performance in the Iterated Prisoner's Dilemma
    arXiv.cs.GT Pub Date : 2020-01-16
    Nikoleta E. Glynatsi; Vincent A. Knight

    The Iterated Prisoner's Dilemma has been used for decades as a model of behavioural interactions. From the celebrated performance of Tit for Tat, to the introduction of the zero-determinant strategies, to the use of sophisticated structures such as neural networks, the literature has been exploring the performance of strategies in the game for years. The results of the literature, however, have been relying on the performance of specific strategies in a finite number of tournaments. This manuscript evaluates 195 strategies' effectiveness in more than 40000 tournaments. The top ranked strategies are presented, and moreover, the impact of features on their success are analysed using machine learning techniques. The analysis determines that the cooperation ratio of a strategy in a given tournament compared to the mean and median cooperator is the most important feature. The conclusions are distinct for different types of tournaments. For instance a strategy with a theory of mind would aim to be the mean/median cooperator in standard tournaments, whereas in tournaments with probabilistic ending it would aim to cooperate 10% of the times the median cooperator did.

  • On the Computational Complexity of Decision Problems about Multi-Player Nash Equilibria
    arXiv.cs.GT Pub Date : 2020-01-15
    Marie Louisa Tølbøll Berthelsen; Kristoffer Arnsfelt Hansen

    We study the computational complexity of decision problems about Nash equilibria in $m$-player games. Several such problems have recently been shown to be computationally equivalent to the decision problem for the existential theory of the reals, or stated in terms of complexity classes, $\exists\mathbb{R}$-complete, when $m\geq 3$. We show that, unless they turn into trivial problems, they are $\exists\mathbb{R}$-hard even for 3-player zero-sum games. We also obtain new results about several other decision problems. We show that when $m\geq 3$ the problems of deciding if a game has a Pareto optimal Nash equilibrium or deciding if a game has a strong Nash equilibrium are $\exists\mathbb{R}$-complete. The latter result rectifies a previous claim of NP-completeness in the literature. We show that deciding if a game has an irrational valued Nash equilibrium is $\exists\mathbb{R}$-hard, answering a question of Bil\`o and Mavronicolas, and address also the computational complexity of deciding if a game has a rational valued Nash equilibrium. These results also hold for 3-player zero-sum games. Our proof methodology applies to corresponding decision problems about symmetric Nash equilibria in symmetric games as well, and in particular our new results carry over to the symmetric setting. Finally we show that deciding whether a symmetric $m$-player games has a non-symmetric Nash equilibrium is $\exists\mathbb{R}$-complete when $m\geq 3$, answering a question of Garg, Mehta, Vazirani, and Yazdanbod.

  • Faster Regret Matching
    arXiv.cs.GT Pub Date : 2020-01-14
    Dawen Wu

    The regret matching algorithm proposed by Sergiu Hart is one of the most powerful iterative methods in finding correlated equilibrium. However, it is possibly not efficient enough, especially in large scale problems. We first rewrite the algorithm in a computationally practical way based on the idea of the regret matrix. Moreover, the rewriting makes the original algorithm more easy to understand. Then by some modification to the original algorithm, we introduce a novel variant, namely faster regret matching. The experiment result shows that the novel algorithm has a speed advantage comparing to the original one.

  • Inducing Cooperation in Multi-Agent Games Through Status-Quo Loss
    arXiv.cs.GT Pub Date : 2020-01-15
    Pinkesh Badjatiya; Mausoom Sarkar; Abhishek Sinha; Siddharth Singh; Nikaash Puri; Balaji Krishnamurthy

    Social dilemma situations bring out the conflict between individual and group rationality. When individuals act rationally in such situations, the group suffers sub-optimal outcomes. The Iterative Prisoner's Dilemma (IPD) is a two-player game that offers a theoretical framework to model and study such social situations. In the Prisoner's Dilemma, individualistic behavior leads to mutual defection and sub-optimal outcomes. This result is in contrast to what one observes in human groups, where humans often sacrifice individualistic behavior for the good of the collective. It is interesting to study how and why such cooperative and individually irrational behavior emerges in human groups. To this end, recent work models this problem by treating each player as a Deep Reinforcement Learning (RL) agent and evolves cooperative behavioral policies through internal information or reward sharing mechanisms. We propose an approach to evolve cooperative behavior between RL agents playing the IPD game without sharing rewards, internal details (weights, gradients), or a communication channel. We introduce a Status-Quo loss (SQLoss) that incentivizes cooperative behavior by encouraging policy stationarity. We also describe an approach to transform a two-player game (with visual inputs) into its IPD formulation through self-supervised skill discovery (IPDistill).We show how our approach outperforms existing approaches in the Iterative Prisoner's Dilemma and the two-player Coin game.

  • No-regret Learning in Cournot Games
    arXiv.cs.GT Pub Date : 2019-06-15
    Yuanyuan Shi; Baosen Zhang

    In this work, we study the interaction of strategic players in continuous action Cournot games with limited information feedback. Cournot game is the essential model for many socio-economic systems where players learn and compete. In addition, in many practical settings these players do not have full knowledge of the system or of each other. In this limited information setting, it becomes important to understand the dynamics and limiting behavior of the players. Specifically, we assume players follow strategies such that in hindsight their payoffs are not exceeded by any single deviating action. Given this no-regret guarantee, we prove that under standard assumptions, the players' joint action (both in the sense of time average and final iteration convergence) converges to the unique Nash equilibrium. In addition, our results naturally extend the existing regret analysis on time average convergence to obtain final iteration convergence rates. Together, our work presents significantly sharper and generalized convergence results, and shows how exploiting the game information feedback can influence the convergence rates.

  • Fixed Points of the Set-Based Bellman Operator
    arXiv.cs.GT Pub Date : 2020-01-13
    Sarah H. Q. Li; Assalé Adjé; Pierre-Loïc Garoche; Behçet Açıkmeşe

    Motivated by uncertain parameters encountered in Markov decision processes (MDPs), we study the effect of parameter uncertainty on Bellman operator-based methods. Specifically, we consider a family of MDPs where the cost parameters are from a given compact set. We then define a Bellman operator acting on an input set of value functions to produce a new set of value functions as the output under all possible variations in the cost parameters. Finally we prove the existence of a fixed point of this set-based Bellman operator by showing that it is a contractive operator on a complete metric space.

  • Hierarchical Network Formation Games
    arXiv.cs.GT Pub Date : 2020-01-14
    Pedro Cisneros-Velarde; Francesco Bullo

    We propose a novel network formation game that explains the emergence of various hierarchical structures in groups where self-interested or utility-maximizing individuals decide to establish or severe relationships of authority or collaboration among themselves. We consider two settings: we first consider individuals who do not seek the other party's consent when establishing a relationship and then individuals who do. For both settings, we formally relate the emerged hierarchical structures with the novel inclusion of well-motivated hierarchy promoting terms in the individuals' utility functions. We first analyze the game via a static analysis and characterize all the hierarchical structures that can be formed as its solutions. We then consider the game played dynamically under stochastic interactions among individuals implementing best-response dynamics and analyze the nature of the converged networks.

  • Smooth markets: A basic mechanism for organizing gradient-based learners
    arXiv.cs.GT Pub Date : 2020-01-14
    David Balduzzi; Wojiech M Czarnecki; Thomas W Anthony; Ian M Gemp; Edward Hughes; Joel Z Leibo; Georgios Piliouras; Thore Graepel

    With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.

  • A Game-Theoretic Approach to a Task Delegation Problem
    arXiv.cs.GT Pub Date : 2020-01-11
    Donya G. Dobakhshari; Lav R. Varshney; Vijay Gupta

    We study a setting in which a principal selects an agent to execute a collection of tasks according to a specified priority sequence. Agents, however, have their own individual priority sequences according to which they wish to execute the tasks. There is information asymmetry since each priority sequence is private knowledge for the individual agent. We design a mechanism for selecting the agent and incentivizing the selected agent to realize a priority sequence for executing the tasks that achieves socially optimal performance. Our proposed mechanism consists of two parts. First, the principal runs an auction to select an agent to allocate tasks to with minimum declared priority sequence misalignment. Then, the principal rewards the agent according to the realized priority sequence with which the tasks were performed. We show that the proposed mechanism is individually rational and incentive compatible. Further, it is also socially optimal for the case of linear cost of priority sequence modification for the agents.

  • Incentive Design in a Distributed Problem with Strategic Agents
    arXiv.cs.GT Pub Date : 2018-02-19
    Donya Ghavidel; Pratyush Chakraborty; Enrique Baeyens; Vijay Gupta; Pramod P. Khargonekar

    In this paper, we consider a general distributed system with multiple agents who select and then implement actions in the system. The system has an operator with a centralized objective. The agents, on the other hand, are selfinterested and strategic in the sense that each agent optimizes its own individual objective. The operator aims to mitigate this misalignment by designing an incentive scheme for the agents. The problem is difficult due to the cost functions of the agents being coupled, the objective of the operator not being social welfare, and the operator having no direct control over actions being implemented by the agents. This problem has been studied in many fields, particularly in mechanism design and cost allocation. However, mechanism design typically assumes that the operator has knowledge of the cost functions of the agents and the actions being implemented by the operator. On the other hand, cost allocation classically assumes that agents do not anticipate the effect of their actions on the incentive that they obtain. We remove these assumptions and present an incentive rule for this setup by bridging the gap between mechanism design and classical cost allocation. We analyze whether the proposed design satisfies various desirable properties such as social optimality, budget balance, participation constraint, and so on. We also analyze which of these properties can be satisfied if the assumptions of cost functions of the agents being private and the agents being anticipatory are relaxed.

  • Strategic Formation and Reliability of Supply Chain Networks
    arXiv.cs.GT Pub Date : 2019-09-17
    Victor Amelkin; Rakesh Vohra

    Supply chains are the backbone of the global economy. Disruptions to them can be costly. Centrally managed supply chains invest in ensuring their resilience. Decentralized supply chains, however, must rely upon the self-interest of their individual components to maintain the resilience of the entire chain. We examine the incentives that independent self-interested agents have in forming a resilient supply chain network in the face of production disruptions and competition. In our model, competing suppliers are subject to yield uncertainty (they deliver less than ordered) and congestion (lead time uncertainty or, "soft" supply caps). Competing retailers must decide which suppliers to link to based on both price and reliability. In the presence of yield uncertainty only, the resulting supply chain networks are sparse. Retailers concentrate their links on a single supplier, counter to the idea that they should mitigate yield uncertainty by diversifying their supply base. This happens because retailers benefit from supply variance. It suggests that competition will amplify output uncertainty. When congestion is included as well, the resulting networks are denser and resemble the bipartite expander graphs that have been proposed in the supply chain literature, thereby, providing the first example of endogenous formation of resilient supply chain networks, without resilience being explicitly encoded in payoffs. Finally, we show that a supplier's investments in improved yield can make it worse off. This happens because high production output saturates the market, which, in turn lowers prices and profits for participants.

  • A Game Generative Network Framework with its Application to Relationship Inference
    arXiv.cs.GT Pub Date : 2020-01-11
    Jie Huang; Fanghua Ye; Xu Chen

    A game process is a system where the decisions of one agent can influence the decisions of other agents. In the real world, social influences and relationships between agents may influence the decision makings of agents with game behaviors. And in turn, this also gives us the possibility to mine some information from such agents, such as the relationships between them, by the interactions in a game process. In this paper, we propose a Game Generative Network (GGN) framework which utilizes the deviation between the real game outcome and the ideal game model to build networks for game processes, which opens a door for understanding more about agents with game behaviors by graph mining approaches. We apply GGN to the team game as a concrete application and conduct experiments on relationship inference tasks.

  • Permissioned Blockchain Revisited: A Byzantine Game-Theoretical Perspective
    arXiv.cs.GT Pub Date : 2020-01-12
    Dongfang Zhao

    Despite the popularity and practical applicability of blockchains, there is very limited work on the theoretical foundation of blockchains: The lack of rigorous theory and analysis behind the curtain of blockchains has severely staggered its broader applications. This paper attempts to lay out a theoretical foundation for a specific type of blockchains---the ones requiring basic authenticity from the participants, also called \textit{permissioned blockchain}. We formulate permissioned blockchain systems and operations into a game-theoretical problem by incorporating constraints implied by the wisdom from distributed computing and Byzantine systems. We show that in a noncooperative blockchain game (NBG), a Nash equilibrium can be efficiently found in a closed-form even though the game involves more than two players. Somewhat surprisingly, the simulation results of the Nash equilibrium implies that the game can reach a stable status regardless of the number of Byzantine nodes and trustworthy players. We then study a harder problem where players are allowed to form coalitions: the coalitional blockchain game (CBG). We show that although the Shapley value for a CBG can be expressed in a more succinct form, its core is empty.

  • Supply Network Formation and Fragility
    arXiv.cs.GT Pub Date : 2020-01-12
    Matthew Elliott; Benjamin Golub; Matthew V. Leduc

    We model the production of complex goods in a large supply network. Firms source several essential inputs through relationships with other firms. Relationships may fail, and given this idosyncratic risk, firms multisource inputs and make costly investments to make relationships with suppliers stronger (less likely to fail). We find that aggregate production is discontinuous in the strength of these relationships. This has stark implications for equilibrium outcomes. We give conditions under which the supply network is endogenously fragile, so that arbitrarily small negative shocks to relationship strength lead to a large, discontinuous drop in aggregate output.

  • Games Where You Can Play Optimally with Finite Memory
    arXiv.cs.GT Pub Date : 2020-01-12
    Patricia Bouyer; Stéphane Le Roux; Youssouf Oualhadj; Mickael Randour; Pierre Vandenhove

    For decades, two-player (antagonistic) games on graphs have been a framework of choice for many important problems in theoretical computer science. A notorious one is controller synthesis, which can be rephrased through the game-theoretic metaphor as the quest for a winning strategy of the system in a game against its antagonistic environment. Depending on the specification, optimal strategies might be simple or quite complex, for example having to use (possibly infinite) memory. Hence, research strives to understand which settings allow for simple strategies. In 2005, Gimbert and Zielonka provided a complete characterization of preference relations (a formal framework to model specifications and game objectives) that admit memoryless optimal strategies for both players. In the last fifteen years however, practical applications have driven the community toward games with complex or multiple objectives, where memory --- finite or infinite --- is almost always required. Despite much effort, the exact frontiers of the class of preference relations that admit finite-memory optimal strategies still elude us. In this work, we establish a complete characterization of preference relations that admit optimal strategies using arena-independent finite memory, generalizing the work of Gimbert and Zielonka to the finite-memory case. We also prove an equivalent to their celebrated corollary of utmost practical interest: if both players have optimal (arena-independent-)finite-memory strategies in all one-player games, then it is also the case in all two-player games. Finally, we pinpoint the boundaries of our results with regard to the literature: our work completely covers the case of arena-independent memory (e.g., multiple parity objectives, lower- and upper-bounded energy objectives), and paves the way to the arena-dependent case (e.g., multiple lower-bounded energy objectives).

  • Computational Hardness of Multidimensional Subtraction Games
    arXiv.cs.GT Pub Date : 2020-01-12
    Vladimir Gurvich; Michael Vyalyi

    We study algorithmic complexity of solving subtraction games in a~fixed dimension with a finite difference set. We prove that there exists a game in this class such that any algorithm solving the game runs in exponential time. Also we prove an existence of a game in this class such that solving the game is PSPACE-hard. The results are based on the construction introduced by Larsson and W\"astlund. It relates subtraction games and cellular automata.

  • Bad cycles in iterative Approval Voting
    arXiv.cs.GT Pub Date : 2020-01-13
    Benoît KloecknerLAMA

    This article is about synchronized iterative voting in the context of Approval Voting. Assuming that, before an election, successive polls occur to which voters react strategically, we shall exhibit examples showing the possibility of cycles with strong negative properties (in particular, non election of an existing Condorcet winner, or possible election of a candidate strongly rejected by a majority of the electorate). We also show that such cycles persist if only a proportion of the voters adjust their ballot at each iteration and if their strategy changes when close ties occur.

  • Resource Sharing in the Edge: A Distributed Bargaining-Theoretic Approach
    arXiv.cs.GT Pub Date : 2020-01-13
    Faheem Zafari; Prithwish Basu; Kin K. Leung; Jian Li; Ananthram Swami; Don Towsley

    The growing demand for edge computing resources, particularly due to increasing popularity of Internet of Things (IoT), and distributed machine/deep learning applications poses a significant challenge. On the one hand, certain edge service providers (ESPs) may not have sufficient resources to satisfy their applications according to the associated service-level agreements. On the other hand, some ESPs may have additional unused resources. In this paper, we propose a resource-sharing framework that allows different ESPs to optimally utilize their resources and improve the satisfaction level of applications subject to constraints such as communication cost for sharing resources across ESPs. Our framework considers that different ESPs have their own objectives for utilizing their resources, thus resulting in a multi-objective optimization problem. We present an $N$-person \emph{Nash Bargaining Solution} (NBS) for resource allocation and sharing among ESPs with \emph{Pareto} optimality guarantee. Furthermore, we propose a \emph{distributed}, primal-dual algorithm to obtain the NBS by proving that the strong-duality property holds for the resultant resource sharing optimization problem. Using synthetic and real-world data traces, we show numerically that the proposed NBS based framework not only enhances the ability to satisfy applications' resource demands, but also improves utilities of different ESPs.

  • A Universal Attractor Decomposition Algorithm for Parity Games
    arXiv.cs.GT Pub Date : 2020-01-13
    Marcin Jurdziński; Rémi Morvan

    An attractor decomposition meta-algorithm for solving parity games is given that generalizes the classic McNaughton-Zielonka algorithm and its recent quasi-polynomial variants due to Parys (2019), and to Lehtinen, Schewe, and Wojtczak (2019). The central concepts studied and exploited are attractor decompositions of dominia in parity games and the ordered trees that describe the inductive structure of attractor decompositions. The main technical results include the embeddable decomposition theorem and the dominion separation theorem that together help establish a precise structural condition for the correctness of the universal algorithm: it suffices that the two ordered trees given to the algorithm as inputs embed the trees of some attractor decompositions of the largest dominia for each of the two players, respectively. The universal algorithm yields McNaughton-Zielonka, Parys's, and Lehtinen-Schewe-Wojtczak algorithms as special cases when suitable universal trees are given to it as inputs. The main technical results provide a unified proof of correctness and deep structural insights into those algorithms. A symbolic implementation of the universal algorithm is also given that improves the symbolic space complexity of solving parity games in quasi-polynomial time from $O(d \lg n)$---achieved by Chatterjee, Dvo\v{r}\'{a}k, Henzinger, and Svozil (2018)---down to $O(\lg d)$, where $n$ is the number of vertices and $d$ is the number of distinct priorities in a parity game. This not only exponentially improves the dependence on $d$, but it also entirely removes the dependence on $n$.

  • Good-for-games $ω$-Pushdown Automata
    arXiv.cs.GT Pub Date : 2020-01-13
    Karoliina Lehtinen; Martin Zimmermann

    We introduce good-for-games $\omega$-pushdown automata ($\omega$-GFG-PDA). These are automata whose nondeterminism can be resolved based on the run constructed thus far. Good-for-gameness enables automata to be composed with games, trees, and other automata, applications which otherwise require deterministic automata. Our main results show that $\omega$-GFG-PDA are more expressive than deterministic $\omega$-pushdown automata and that solving infinite games with winning conditions specified by $\omega$-GFG-PDA is EXPTIME-complete, i.e., we have identified a new class of $\omega$-contextfree winning conditions for which solving games is decidable. This means in particular that the universality problem is in EXPTIME as well. Moreover, we study closure properties of the class of languages recognized by $\omega$-GFG-PDA and decidability of good-for-gameness of $\omega$-pushdown automata and languages.

  • One-Clock Priced Timed Games are PSPACE-hard
    arXiv.cs.GT Pub Date : 2020-01-13
    John Fearnley; Rasmus Ibsen-Jensen; Rahul Savani

    The main result of this paper is that computing the value of a one-clock priced timed game (OCPTG) is PSPACE-hard. Along the way, we provide a family of OCPTGs that have an exponential number of event points. Both results hold even in very restricted classes of games such as DAGs with treewidth three. Finally, we provide a number of positive results, including polynomial-time algorithms for even more restricted classes of OCPTGs such as trees.

  • Targeting Interventions in Networks
    arXiv.cs.GT Pub Date : 2017-10-16
    Andrea Galeotti; Benjamin Golub; Sanjeev Goyal

    We study games in which a network mediates strategic spillovers and externalities among the players. How does a planner optimally target interventions that change individuals' private returns to investment? We analyze this question by decomposing any intervention into orthogonal principal components, which are determined by the network and are ordered according to their associated eigenvalues. There is a close connection between the nature of spillovers and the representation of various principal components in the optimal intervention. In games of strategic complements (substitutes), interventions place more weight on the top (bottom) principal components, which reflect more global (local) network structure. For large budgets, optimal interventions are simple -- they involve a single principal component.

  • Game of Thrones: Fully Distributed Learning for Multi-Player Bandits
    arXiv.cs.GT Pub Date : 2018-10-26
    Ilai Bistritz; Amir Leshem

    We consider an N-player multi-armed bandit game where each player chooses one out of M arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed or Markovian. When two or more players choose the same arm, they all receive zero reward. Performance is measured using the expected sum of regrets, compared to optimal assignment of arms to players that maximizes the sum of expected rewards. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible. We present a distributed algorithm and prove that it achieves an expected sum of regrets of near-O\left(\log T\right). This is the first algorithm to achieve a near order optimal regret in this fully distributed scenario. All other works have assumed that either all players have the same vector of expected rewards or that communication between players is possible.

  • Multi-Agent Common Knowledge Reinforcement Learning
    arXiv.cs.GT Pub Date : 2018-10-27
    Christian A. Schroeder de Witt; Jakob N. Foerster; Gregory Farquhar; Philip H. S. Torr; Wendelin Boehmer; Shimon Whiteson

    Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others' observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.

  • Dynamic Interaction between Shared Autonomous Vehicles and Public Transit: A Competitive Perspective
    arXiv.cs.GT Pub Date : 2020-01-09
    Baichuan Mo; Zhejing Cao; Hongmou Zhang; Yu Shen; Jinhua Zhao

    The emergence of autonomous vehicles (AVs) is anticipated to influence the public transportation (PT) system. Many possible relationships between AV and PT are proposed depending on the policy and institution, where competition and cooperation are two main categories. This paper focuses on the former in a hypothetical scenario-"if both AV and PT operators were only profit-oriented." We aim to quantitatively evaluate the system performance (e.g. level of service, operators' financial viability, transport efficiency) when AV and PT are profit-oriented competitors with dynamic adjustable supply strategies under certain policy constraints. We assume AV can adjust the fleetsize and PT can adjust the headway. Service fare and bus routes are fixed. The competition process is analyzed through an agent-based simulation platform, which incorporates a proposed heuristic dynamic supply updating algorithm (HDSUA). The first-mile scenario in Singapore Tampines area is selected as the case study, where only bus is considered for PT system. We found that when AV and bus operators are given the flexibility to adjust supply, both of them will re-distribute their supply spatially and temporally, leading to higher profits. In temporal dimension, both AV and bus will concentrate their supplies in morning and evening peak hours, and reduce the supplies in off-peak hours. The competition between AV and PT decreases passengers' travel time but increase their travel cost. The generalized travel cost is still reduced when counting the value of time. The bus supply adjustment can increase the bus average load and reduce total passenger car equivalent (PCE), which is good for transport efficiency and sustainability. But the AV supply adjustment shows the opposite effect. Overall, the competition does not necessarily bring out loss-gain results. A win-win outcome is also possible under certain policy interventions.

  • Behavioral and Game-Theoretic Security Investments in Interdependent Systems Modeled by Attack Graphs
    arXiv.cs.GT Pub Date : 2020-01-09
    Mustafa Abdallah; Parinaz Naghizadeh; Ashish R. Hota; Timothy Cason; Saurabh Bagchi; Shreyas Sundaram

    We consider a system consisting of multiple interdependent assets, and a set of defenders, each responsible for securing a subset of the assets against an attacker. The interdependencies between the assets are captured by an attack graph, where an edge from one asset to another indicates that if the former asset is compromised, an attack can be launched on the latter asset. Each edge has an associated probability of successful attack, which can be reduced via security investments by the defenders. In such scenarios, we investigate the security investments that arise under certain features of human decision-making that have been identified in behavioral economics. In particular, humans have been shown to perceive probabilities in a nonlinear manner, typically overweighting low probabilities and underweighting high probabilities. We show that suboptimal investments can arise under such weighting in certain network topologies. We also show that pure strategy Nash equilibria exist in settings with multiple (behavioral) defenders, and study the inefficiency of the equilibrium investments by behavioral defenders compared to a centralized socially optimal solution.

  • Optimal dynamic information provision in traffic routing
    arXiv.cs.GT Pub Date : 2020-01-09
    Emily Meigs; Francesca Parise; Asuman Ozdaglar; Daron Acemoglu

    We consider a two-road dynamic routing game where the state of one of the roads (the "risky road") is stochastic and may change over time. This generates room for experimentation. A central planner may wish to induce some of the (finite number of atomic) agents to use the risky road even when the expected cost of travel there is high in order to obtain accurate information about the state of the road. Since agents are strategic, we show that in order to generate incentives for experimentation the central planner however needs to limit the number of agents using the risky road when the expected cost of travel on the risky road is low. In particular, because of congestion, too much use of the risky road when the state is favorable would make experimentation no longer incentive compatible. We characterize the optimal incentive compatible recommendation system, first in a two-stage game and then in an infinite-horizon setting. In both cases, this system induces only partial, rather than full, information sharing among the agents (otherwise there would be too much exploitation of the risky road when costs there are low).

  • Auction-based Charging Scheduling with Deep Learning Framework for Multi-Drone Networks
    arXiv.cs.GT Pub Date : 2020-01-09
    MyungJae Shin; Joongheon Kim; Marco Levorato

    State-of-the-art drone technologies have severe flight time limitations due to weight constraints, which inevitably lead to a relatively small amount of available energy. Therefore, frequent battery replacement or recharging is necessary in applications such as delivery, exploration, or support to the wireless infrastructure. Mobile charging stations (i.e., mobile stations with charging equipment) for outdoor ad-hoc battery charging is one of the feasible solutions to address this issue. However, the ability of these platforms to charge the drones is limited in terms of the number and charging time. This paper designs an auction-based mechanism to control the charging schedule in multi-drone setting. In this paper, charging time slots are auctioned, and their assignment is determined by a bidding process. The main challenge in developing this framework is the lack of prior knowledge on the distribution of the number of drones participating in the auction. Based on optimal second-price-auction, the proposed formulation, then, relies on deep learning algorithms to learn such distribution online. Numerical results from extensive simulations show that the proposed deep learning-based approach provides effective battery charging control in multi-drone scenarios.

  • Convergence of Large Atomic Congestion Games
    arXiv.cs.GT Pub Date : 2020-01-09
    Roberto Cominetti; Marco Scarsini; Marc Schröder; Nicolás Stier-Moses

    We study the convergence of sequences of atomic unsplittable congestion games with an increasing number of players. We consider two situations. In the first setting, each player has a weight that tends to zero, in which case the mixed equilibria of the finite games converge to the set of Wardrop equilibria of the corresponding nonatomic limit game. In the second case, players have unit weights, but participate in the game with a probability that tends to zero. In this case, the mixed equilibria converge to the set of Wardrop equilibria of another nonatomic game with suitably defined costs, which can be seen as a Poisson game in the sense of Myerson (1998). In both settings we show that the price of anarchy of the sequence of games converges to the price of anarchy of the nonatomic limit. Beyond the case of congestion games, we establish a general result on the convergence of large games with random players towards a Poisson game.

  • smartSDH: A Mechanism Design Approach to Building Control
    arXiv.cs.GT Pub Date : 2020-01-09
    Ioannis C. Konstantakopoulos; Kristy A. Hamilton; Tanya Veeravalli; Costas Spanos; Roy Dong

    As Internet of Things (IoT) technologies are increasingly being deployed, situations frequently arise where multiple stakeholders must reconcile preferences to control a shared resource. We perform a 5-month long experiment dubbed "smartSDH" (carried out in 27 employees' office space) where users report their preferences for the brightness of overhead lighting. smartSDH implements a modified Vickrey-Clarke-Groves (VCG) mechanism; assuming users are rational, it incentivizes truthful reporting, implements the socially desirable outcome, and compensates participants to ensure higher payoffs under smartSDH when compared with the default outside option. smartSDH assesses the feasibility of the VCG mechanism in the context of smart building control and evaluated smartSDH's effect using metrics such as light level satisfaction, incentive satisfaction, and energy consumption. Despite the mechanism's theoretical properties, we found participants were significantly less satisfied with light brightness and incentives determined by the VCG mechanism over time. These data suggest the need for more realistic behavioral models to design IoT technologies and highlights difficulties in estimating preferences from observable external factors such as atmospheric conditions.

  • A game of hide and seek in networks
    arXiv.cs.GT Pub Date : 2020-01-09
    Francis Bloch; Bhaskar Dutta; Marcin Dziubinski

    We propose and study a strategic model of hiding in a network, where the network designer chooses the links and his position in the network facing the seeker who inspects and disrupts the network. We characterize optimal networks for the hider, as well as equilibrium hiding and seeking strategies on these networks. We show that optimal networks are either equivalent to cycles or variants of a core-periphery networks where every node in the periphery is connected to a single node in the core.

  • Egoistic Incentives Based on Zero-Determinant Alliances for Large-Scale Systems
    arXiv.cs.GT Pub Date : 2020-01-08
    Shengling Wang; Peizi Ma; Qin Hu; Xiuzhen Cheng; Weifeng Lv

    Social dilemmas exist in various fields and give rise to the so-called free-riding problem, leading to collective fiascos. The difficulty of tracking individual behaviors makes egoistic incentives in large-scale systems a challenging task. However, the state-of-the-art mechanisms are either individual-based or state-dependent, resulting in low efficiency in large-scale networks. In this paper, we propose an egoistic incentive mechanism from a connected (network) perspective rather than an isolated (individual) perspective by taking advantage of the social nature of people. We make use of a zero-determinant (ZD) strategy for rewarding cooperation and sanctioning defection. After proving cooperation is the dominant strategy for ZD players, we optimize their deployment to facilitate cooperation over the whole system. To further speed up cooperation, we derive a ZD alliance strategy for sequential multiple-player repeated games to empower ZD players with higher controllable leverage, which undoubtedly enriches the theoretical system of ZD strategies and broadens their application domain. Our approach is stateless and stable, which contributes to its scalability. Extensive simulations based on a real world trace data as well as synthetic data demonstrate the effectiveness of our proposed egoistic incentive approach under different networking scenarios.

  • Near-optimal Robust Bilevel Optimization
    arXiv.cs.GT Pub Date : 2019-08-12
    Mathieu Besançon; Miguel F. Anjos; Luce Brotcorne

    Bilevel optimization studies problems where the optimal response to a second mathematical optimization problem is integrated in the constraints. Such structure arises in a variety of decision-making problems in areas such as market equilibria, policy design or product pricing. We introduce near-optimal robustness for bilevel problems, protecting the upper-level decision-maker from bounded rationality at the lower level and show it is a restriction of the corresponding pessimistic bilevel problem. Essential properties are derived in generic and specific settings. This model finds a corresponding and intuitive interpretation in various situations cast as bilevel optimization problems. We develop a duality-based solution method for cases where the lower level is convex, leveraging the methodology from robust and bilevel literature. The models obtained are tested numerically using different solvers and formulations, showing the successful implementation of the near-optimal bilevel problem.

  • Selfish Behavior in the Tezos Proof-of-Stake Protocol
    arXiv.cs.GT Pub Date : 2019-12-06
    Michael Neuder; Daniel J. Moroz; Rithvik Rao; David C. Parkes

    Proof-of-Stake consensus protocols give rise to complex modeling challenges. We analyze the recently-updated Tezos Proof-of-Stake protocol and demonstrate that, under certain conditions, rational participants are incentivized to behave dishonestly. In doing so, we provide a theoretical analysis of the feasibility and profitability of a block stealing attack that we call selfish endorsing, a concrete instance of an attack previously only theoretically considered. We propose and analyze a simple change to the Tezos protocol which significantly reduces the (already small) profitability of this dishonest behavior, and introduce a new delay and reward scheme that is provably secure against length-1 and length-2 selfish endorsing attacks. Our framework provides a template for analyzing other Proof-of-Stake implementations for selfish behavior.

  • Price Competition with LTE-U and WiFi
    arXiv.cs.GT Pub Date : 2020-01-06
    Xu Wang; Randall Berry

    LTE-U is an extension of the Long Term Evolution (LTE) standard for operation in unlicensed spectrum. LTE-U differs from WiFi, the predominant technology used in unlicensed spectrum in that it utilizes a duty cycle mode for accessing the spectrum and allows for a more seamless integration with LTE deployments in licensed spectrum. There have been a number of technical studies on the co-existence of LTE-U and WiFi in unlicensed spectrum In this paper, we instead investigate the impact of such a technology from an economic perspective. We consider a model in which an incumbent service provider (SP) deploys a duty cycle-based technology like LTE-U in an unlicensed band along with operating in a licensed band and competes with one or more entrants that only operate in the unlicensed band using a different technology like WiFi. We characterize the impact of a technology like LTE-U on the market outcome and show that the welfare impacts of this technology are subtle, depending in part on the amount of unlicensed spectrum and number of entrants. The difference in spectral efficiency between LTE and WiFi also plays a role in the competition among SPs. Finally, we investigate the impact of the duty cycle and the portion of unlicensed spectrum used by the technology.

  • Optimal versus Nash Equilibrium Computation for Networked Resource Allocation
    arXiv.cs.GT Pub Date : 2014-04-14
    S. Rasoul Etesami

    Motivated by emerging resource allocation and data placement problems such as web caches and peer-to-peer systems, we consider and study a class of resource allocation problems over a network of agents (nodes). In this model, nodes can store only a limited number of resources while accessing the remaining ones through their closest neighbors. We consider this problem under both optimization and game-theoretic frameworks. In the case of optimal resource allocation we will first show that when there are only k=2 resources, the optimal allocation can be found efficiently in O(n^2\log n) steps, where n denotes the total number of nodes. However, for k>2 this problem becomes NP-hard with no polynomial time approximation algorithm with a performance guarantee better than 1+1/102k^2, even under metric access costs. We then provide a 3-approximation algorithm for the optimal resource allocation which runs only in linear time O(n). Subsequently, we look at this problem under a selfish setting formulated as a noncooperative game and provide a 3-approximation algorithm for obtaining its pure Nash equilibria under metric access costs. We then establish an equivalence between the set of pure Nash equilibria and flip-optimal solutions of the Max-k-Cut problem over a specific weighted complete graph. Using this reduction, we show that finding the lexicographically smallest Nash equilibrium for k> 2 is NP-hard, and provide an algorithm to find it in O(n^3 2^n) steps. While the reduction to weighted Max-k-Cut suggests that finding a pure Nash equilibrium using best response dynamics might be PLS-hard, it allows us to use tools from quadratic programming to devise more systematic algorithms towards obtaining Nash equilibrium points.

  • A comparison of penalty shootout designs in soccer
    arXiv.cs.GT Pub Date : 2018-06-04
    László Csató

    Penalty shootout in soccer is recognized to be unfair because the team kicking first in all rounds enjoys a significant advantage. The so-called Catch-Up Rule has been suggested recently to solve this problem but is shown here not to be fairer than the simpler deterministic Alternating (ABBA) Rule that has already been tried. We introduce the Adjusted Catch-Up Rule by guaranteeing the first penalty of the possible sudden death stage to the team disadvantaged in the first round. It outperforms the Catch-Up and Alternating Rules, while remains straightforward to implement. A general measure of complexity for penalty shootout mechanisms is also provided as the minimal number of binary questions required to decide the first-mover in a given round without knowing the history of the penalty shootout. This quantification permits a two-dimensional evaluation of any mechanism proposed in the future.

Contents have been reproduced by permission of the publishers.
上海纽约大学William Glover