Elsevier

Neurocomputing

Volume 437, 21 May 2021, Pages 31-41
Neurocomputing

Optimal pricing in black box producer-consumer Stackelberg games using revealed preference feedback

https://doi.org/10.1016/j.neucom.2021.01.026Get rights and content

Abstract

This paper considers an optimal pricing problem for the black box producer-consumer Stackelberg game. A producer sets price over a set of goods to maximize profit (the difference in revenue and cost function). The consumer buys a quantity to maximize the difference between the value of the quantity consumed and the cost. The value function of the consumer and the cost function of the producer are ‘black box’ functions (unknown functions with limited or costly evaluations). Using Gaussian processes, Bayesian optimization and Bayesian quadrature we derive an algorithm for learning the optimal price. The method has the following significant advantages: (i) the method is efficient and scales well compared to existing techniques, (ii) the cost function of the producer could be non-convex, (iii) the value function and/or cost function can be time varying. We illustrate, using a real dataset, optimal pricing in electricity markets.

Introduction

Stackelberg games have been used to study sequential decision making in fields such as inventory and production control, wholesale and retail pricing strategies, outsourcing, and advertising. In a two-person Stackelberg game, the leading player (or leader) announces an action first and the other player, the follower, chooses an action to optimize performance, given the policy of the leader. If the leader knows the objective function of the follower, then the leader anticipates the response of the follower and picks an action that optimizes its performance. The focus of this paper is the case where the objective functions of the leader and follower are ‘black box’ functions; i.e. they are unknown and the number of functional evaluations is limited by time or cost.

Context: In this paper, we consider the following canonical example of a Stackelberg game (see Section 2.1 for details): An iterated game is played between a producer and a consumer. At each iteration, the producer decides the price p. In response to the price, the consumer buys x, which maximizes v(x)-px, where v(·) is the value function. The value function, or, more generally, utility function, captures the preference of the consumer; please refer to [1] for a comprehensive treatment of consumer utility theory. The goal of the producer is to learn the unknown value function of the consumer and set prices that maximize the profit px-c(x), where c(·) is the cost function of the producer. The value function of the consumer and the cost function of the producer are black box functions; they are unknown functions with limited or costly evaluations. The consumer value functions are black box due to:

  • (i)

    unknown value function of consumers,

  • (ii)

    limited budget for price changes. For example, a producer selling seasonal or style goods is limited by the sales period.

The black box assumption of the producer’s cost function is motivated mainly due to ‘switching cost’, i.e. the supply chain cost associated with changing suppliers based on production quantities. The producer has access to the history of price and quantity purchased (the ‘revealed preference’ dataset) and the cost function evaluations to compute optimal price.1

Related Work: The problem of estimating an unknown utility function (or value function) can be traced back to the area of revealed preference in microeconomics and, more recently, in adversarial signal processing [2], [3]. Major contributions to the area of revealed preference in microeconomics are due to Samuelson, Afriat [4] and Varian; see [5] for a recent survey. Given a revealed preference dataset, consisting of various price and quantity consumed by an agent, Afriat [4] devised a method to reconstruct the utility function. The revealed preference framework was later extended to games in [6], [7]. Major drawbacks of the revealed preference approach of Afriat (and its extensions) are:

  • (i)

    it is an offline approach,

  • (ii)

    it does not work with time variations in the value function,

  • (iii)

    it does not consider costly functional evaluations.

Another line of work, in the context of games, is Empirical game-theoretic analysis [8], [9] where a dataset consists of a set of observations of strategy profiles and their corresponding payoffs (or utility values). However, this approach is different from ours, in which the utility function is learnt only from player strategies (or, using revealed preference feedback).

More recent work exploits the concavity of the value function and the cost function and uses a gradient learning approach to compute the optimal price[10], [11]. Under these strong assumptions, the authors devise a polynomial time algorithm for computing the optimal strategy of the leader. Whilst the concavity of the value function is well motivated by the rationality of the follower, modern production features such as indivisibilities, economies of scale and technology specialization make cost functions non-convex [12]. Another major drawback is that the existing algorithms are not black box compatible, i.e. the algorithms do not account for limited or costly evaluations of value or cost function.

This work considers the pricing problem of a producer repeatedly interacting with a single consumer. When there are multiple consumers, with differing value functions, [13] proposes algorithms to learn certain parametric class of value function from aggregate revealed preference feedback. Similarly, when there are multiple consumers, [14], [15] considers social welfare maximization (difference of the sum of consumer value functions and producer cost function) instead of profit maximization. Another central assumption, in this work, is that the consumer is truthful to its utility function. An adversarial consumer can deceive the producer to obtain unfair advantage; such problems under the convexity assumptions are considered in [16], [17]. Other works consider the problem under practical constraints such as pricing restrictions [18], reserve price constraint [19], and network constraints [20]. However, to the best of our knowledge, none of the existing literature considers non-convex cost functions or time varying value functions – the main focus of this paper.

Main Results and Organization: Section 2.1 discusses the producer-consumer Stackelberg game and the key assumptions, and in 2.2 The Gaussian Process (GP), 2.3 Bayesian Quadrature, we discuss the Gaussian process framework and Bayesian Quadrature, the essential tools adopted in this paper. Section 3 contains the following main results:

  • 1.

    In Section 3.1 to Section 3.4, we derive a data efficient non-parametric Bayesian learning algorithm (Algorithm 1) to set optimal price for the producer-consumer Stackelberg game with black box value and cost function. In comparison to existing approaches, Algorithm 1 makes no assumptions about the cost function.

  • 2.

    Section 3.5 extends Algorithm 1 so that the optimal pricing tracks time varying value or cost function.

In Section 3.6, we discuss, briefly, the extension to more general Stackelberg games. Section 4 presents the following numerical results:
  • 1.

    Section 4.1 to Section 4.2 illustrates, using synthetic examples, the advantage of our approach in terms of the number of functional evaluations. When evaluations of value function and/or cost function is costly, Algorithm 1 converges to the optimal price in fewer iterations compared to existing methods.

  • 2.

    Section 4.3 illustrates the advantage of the framework of Algorithm 1 for time varying value function.

  • 3.

    Optimal pricing in an electricity market: Using a real dataset from the Ontario power grid, Section 4.4 shows how to learn the value function of a (representative) electricity consumer and hence, set optimal prices in energy markets.

Finally, Section 5 offers concluding remarks.

Section snippets

Stackelberg games

An iterated game is played between a producer and a consumer. At each iteration, the producer decides the price pIR+d, where d is the number of goods. In response to the price, the consumer buys (consumption bundle) x(p)IR+d, such thatx(p)=argmaxxIR+dv(x)-px,where v(·) is an unknown value function. The producer observes the purchased quantity. The profit of the producer is thus given by:r(x)=px-c(x),where c(·) is the cost function. The objective of the producer is to set a price that

Learning the optimal strategy in Stackelberg games

In this section, we use the Gaussian process to model the black box cost and value function and learn the optimal price. For clarity of exposition, we consider the case where d=1. The extension to the general case is dealt with in Section 4.2.

Numerical results

Section 4.1 considers the case where d=1 and a non-convex cost function. Section 4.2 extends the techniques in Section 3 for d>1, and illustrates the efficiency of Algorithm 1 in terms of functional evaluations. In both the sections, we compare Algorithm 1 with the revealed preference approach [5] and the gradient based approach in [11]. Section 4.3 considers a synthetic time varying example; the revealed preference and the gradient based approach cannot handle time varying functions. Finally,

Conclusion & future work

In this paper, we considered the problem of optimal price setting for producer-consumer games in the context of a black box value function and cost function. Algorithm 1 details a Bayesian algorithm to compute the optimal price, using weaker assumptions than existing literature. In addition, the framework of Algorithm 1 can be readily extended to time varying value and cost functions. Numerical results illustrate the efficiency compared to existing gradient based method and revealed preference

CRediT authorship contribution statement

Anup Aprem: Conceptualization, Methodology, Software, Writing - original draft. Stephen J. Roberts: Conceptualization, Methodology, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Anup Aprem is an Assistant Professor at the ECE Department in National Insitute of Technology, Calicut, India. Prior to this, he was a postdoctoral researcher in the Machine learning group at the University of Oxford, during the period June 2018–Mar 2020. Anup received a Ph.D. in Electrical and Computer Engineering in 2018 from the University of British Columbia, Vancouver, Canada and holds an M.E. in from Indian Institute of Science, Bangalore, India. His research interests are broadly in

References (42)

  • H.R. Varian, Revealed preference, Samuelsonian economics and the twenty-first century (2006)...
  • W. Hoiles et al.

    PAC algorithms for detecting Nash equilibrium play in social networks: from Twitter to Energy markets

    IEEE Access

    (2016)
  • M.P. Wellman, Methods for empirical game-theoretic analysis, in: Proceedings of the 21st National Conference on...
  • P.R. Jordan, L.J. Schvartzman, M.P. Wellman, Strategy exploration in empirical games, in: Proceedings of the 9th...
  • K. Amin, R. Cummings, L. Dworkin, M. Kearns, A. Roth, Online learning and profit maximization from revealed...
  • A. Roth, J. Ullman, Z.S. Wu, Watch and learn: Optimizing from revealed preferences feedback, in: Proceedings of the...
  • K. Kerstens, S. Managi, Total factor productivity growth and convergence in the petroleum industry: empirical analysis...
  • X. Bei et al.

    Learning market parameters using aggregate demand queries

  • A. Roth et al.

    Multidimensional dynamic pricing for welfare maximization

  • Z. Ji et al.

    Social welfare and profit maximization from revealed preferences

  • J. Gan et al.

    Manipulating a learning defender and ways to counteract

  • Cited by (2)

    Anup Aprem is an Assistant Professor at the ECE Department in National Insitute of Technology, Calicut, India. Prior to this, he was a postdoctoral researcher in the Machine learning group at the University of Oxford, during the period June 2018–Mar 2020. Anup received a Ph.D. in Electrical and Computer Engineering in 2018 from the University of British Columbia, Vancouver, Canada and holds an M.E. in from Indian Institute of Science, Bangalore, India. His research interests are broadly in statistical signal processing, stochastic decision making and Bayesian machine learning, with applications to sensor networks, social media and networks and multi-agent systems.

    Professor Stephen Roberts is the Royal Academy of Engineering/Man Group Professor of Machine Learning at the University of Oxford. He is Director of the Oxford-Man Institute of Quantitative Finance and co-founder of the Oxford Machine Learning spin-out company, Mind Foundry. Stephen’s focus lies in the theory and methodology of machine learning for large-scale real-world problems, especially those in which noise and uncertainty abound. He has successfully applied these approaches to a wide range of problem domains including astronomy, biology, finance, engineering, control, sensor networks and system monitoring. His current research interests include the application of machine learning in finance and the engineering industry, as well as a range of theoretical and methodological problems.

    A shorter version of this paper has been presented at the 29th IEEE International Workshop on Machine Learning for Signal Processing, October 13–16, 2019, Pittsburgh, PA, USA

    View full text