Testing probabilistic models of choice using column generation

https://doi.org/10.1016/j.cor.2018.03.001Get rights and content

Highlights

  • We describe a Column Generation algorithm for testing models of probabilistic choice.

  • We investigate the impact of several choices when it comes to implementing the algorithm.

  • We show the Column Generation algorithm is well-suited for computing Bayes factors.

Abstract

In so-called random preference models of probabilistic choice, a decision maker chooses according to an unspecified probability distribution over preference states. The most prominent case arises when preference states are linear orders or weak orders of the choice alternatives. The literature has documented that actually evaluating whether decision makers’ observed choices are consistent with such a probabilistic model of choice poses computational difficulties. This severely limits the possible scale of empirical work in behavioral economics and related disciplines. We propose a family of column generation based algorithms for performing such tests. We evaluate our algorithms on various sets of instances. We observe substantial improvements in computation time and conclude that we can efficiently test substantially larger data sets than previously possible.

Introduction

We consider computational challenges that arise when testing a certain type of probabilistic models of choice behavior. Imagine a decision maker who must specify a best element out of a set of distinct alternatives. In such situations, decision makers do not consistently select the same alternative as best, even when presented with the same (or nearly the same) set of alternatives (see, e.g., Tversky, 1969). Thus, assuming that a decision maker acts deterministically using a single decision rule (say, some linear order of the alternatives) is unrealistic. Probabilistic models of choice, pioneered by Block and Marschak (1960) and Luce (1959), attempt to explain uncertainty and fluctuations in behavior through probabilistic specifications. We concentrate on a class of models in which the permissible preference states are linear orders or weak orders of the alternatives. These are prominent cases in the ongoing research about rationality of preferences in behavioral economics, psychology, neuroscience and zoology (Arbuthnott, Fedina, Pletcher, Promislow, 2017, Brown, Davis-Stober, Regenwetter, 2015, Regenwetter, Dana, Davis-Stober, 2011, Regenwetter, Davis-Stober, 2012). The random preference model captures the decision maker’s uncertainty about preference with a probability distribution over these preference states. The probability of choosing a particular alternative is governed by that probability distribution over preference states.

In a seminal contribution, McFadden and Richter (1990) provided several equivalent (sets of) conditions for choice probabilities to be consistent with such a probabilistic model of choice. However, actually checking these conditions on choice probabilities poses computational challenges. Indeed, straightforwardly evaluating the “axiom of revealed stochastic preference” and the “Block-Marschak polynomials” both require checking a number of conditions that is exponential in the number of choice alternatives. Likewise, the system of linear inequalities and the linear programs given in McFadden and Richter (1990) contained one variable for every preference state. The resulting number of variables grows exponentially in the number of alternatives, for most classes of preference states, including for linear orders. Even so, this linear programming model forms the basis of our column generation approach.

Most work on these probabilistic models has been on binary choice induced by linear orders. More precisely, the probability that a person chooses an alternative i over an alternative j, when required to choose one of the two, is the marginal probability of all linear orders in which i is preferred to j. Block and Marschak (1960) described two classes of inequalities and proved that these inequalities are necessary and sufficient conditions for consistency with the probabilistic model of choice for data sets with up to 3 choice alternatives. Dridi (1980) proved that these conditions are also necessary and sufficient for data sets with up to 5 alternatives and showed that they are no longer sufficient for data sets with 6 or more alternatives. Megiddo (1977) proved that testing data sets for consistency with probabilistic choice induced by linear orders is difficult in general. He showed that the problem is equivalent to testing membership of a given point in the linear ordering polytope. Since optimization and separation over a particular polytope are polynomially equivalent (see Grötschel et al., 1993), it follows that testing whether a given collection of choice probabilities is consistent with a probabilistic model of choice induced by linear orders is np-complete. In the last decade, researchers have generated extensive knowledge on the facial description of the linear ordering polytope (see Doignon et al. (2006); Fiorini (2006), the survey by Charon and Hudry (2007), and the book by Martí and Reinelt (2011), as well as the references contained therein).

When carrying out tests of probabilistic models of choice, scholars usually circumvent the computational challenges that arise when the number of alternatives grows large. Human laboratory experiments keep the number of alternatives small (see, e.g., Brown, Davis-Stober, Regenwetter, 2015, Cavagnaro, Davis-Stober, 2014, Regenwetter, Dana, Davis-Stober, 2011, Regenwetter, Davis-Stober, 2012, who used sets of 5 alternatives). Kitamura and Stoye (2014) tested a probabilistic version of the “strong axiom of revealed preference,” using data from the U.K. Family Expenditure Survey, which they partitioned into subsets of a manageable size. One benefit of our proposed methodology is that it will allow researchers to design studies with larger numbers of choice alternatives, which will, in turn, increase their realism and generalizability. While testing probabilistic choice models is difficult in general, it becomes easy for some settings and classes of preference states. Matzkin (2007) and Hoderlein and Stoye (2014) provided conditions for a probabilistic version of the so-called “weak axiom of revealed preference.” Davis-Stober (2012) described a set of linear inequalities that are necessary and sufficient conditions for probabilistic choice induced by certain heuristic preferences. Smeulders (2018) provided necessary and sufficient conditions for a probabilistic model induced by single-peaked linear orders. Guo and Regenwetter (2014), Regenwetter et al. (2014), and Regenwetter and Robinson (2017) evaluated various sets of necessary and sufficient conditions for binary choice probabilities. For all of these settings, the conditions can be tested in polynomial time.

Here, we propose a family of algorithms based on column generation to test various probabilistic models of choice and apply it to a model induced by linear orders. Column generation is a technique to efficiently solve linear programs with a large number of variables; we come back to this technique in Section 3. Traditionally, the technique of column generation has almost always been applied to optimization problems. Here, however, we use it for a decision problem, namely, to detect whether given choice probabilities satisfy the probabilistic model of choice or not (i.e., a yes/no answer). We show how this affects the algorithm. The rest of this paper unfolds as follows. In Section 2, we lay out the notation, the definitions and the model that we use. Section 3 provides a basic description of the column generation algorithms. Section 4 discusses the implementation of a family of such algorithms and reviews results from computational experiments. In Section 5 we show that when testing the model for many similar choice probabilities, the column generation algorithm can use output from one test to speed up subsequent tests. We illustrate how this is useful for statistical analysis of probabilistic models, e.g., for calculating the Bayes factor to evaluate statistical performance on laboratory data from human subjects. We conclude in Section 6.

Section snippets

Notation and definitions

Consider a set A, consisting of n many alternatives and let AA={(i,j)|i,jA,ij} denote the collection of all ordered pairs of distinct elements of A. For each ordered pair of distinct alternatives (i, j) ∈ AA, we are given a nonnegative number pi, j ≤ 1. These numbers represent the probabilities that i is chosen over j for all distinct i and j in A. For now, we concentrate on two-alternative forced choice, that is, the case in which a person must choose one alternative or the other when

Column generation

In this section, we describe an algorithm based on column generation to detect whether a given data set can be rationalized by the mixture model. Column generation is a technique dating back to Gilmore and Gomory (1961) who used it to solve cutting stock problems. The advantage of using column generation is that we do not have to consider all of the variables at once; instead, we repeatedly solve a linear program of limited size (the so-called restricted master), and we solve a so-called

Implementation

In this section we discuss the implementation of the column generation algorithm. On the one hand, we can run Algorithm 1 by making use of integer programming solvers to solve the pricing problem. However, on the other hand, fast heuristics may already give solutions with a value exceeding the threshold. We describe these heuristic algorithms in Section 4.1. Section 4.2 then contains descriptions of our data sets. Finally, Section 4.3 gives results on computation times for the various

Testing many similar data sets: Bayes factor calculation

In this section, we discuss an application of the column generation algorithm to a statistical problem described by Cavagnaro and Davis-Stober (2014). Those authors used the Bayes factor (Klugkist and Hoijtink, 2007) for statistical model evaluation, model selection, and model competition on data from human subject experiments in the laboratory. The calculation of the Bayes factor requires evaluating a large number of data sets against the conditions of the probabilistic models of choice.

Conclusion

In this paper, we have presented an algorithm for testing models of probabilistic preferences (mixture models), based on column generation. This algorithm is capable of handling data sets of such size that the number of linear orders over all alternatives, and thus the number of variables in Formulation (1) would make the system of equalities prohibitive to solve. In the appendix, we have shown how to modify this algorithm to handle other types of data and other models of choice. We have

Acknowledgments

This paper is based on a PhD thesis chapter of the first author. We thank the referees and the doctoral committee, in particular Prof. Yves Crama, for helpful comments and discussions. This research was supported by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office, by National Science Foundation grants SES-14-59866 (PI: Davis-Stober) & SES-14-59699 (PI: Regenwetter), by National Institutes of Health grant K25AA024182 (PI: Davis-Stober) and by the

References (33)

  • I. Charon et al.

    A survey on the linear ordering problem for weighted or unweighted tournaments

    4OR

    (2007)
  • V. Chvátal

    Linear Programming

    (1983)
  • T. Dridi

    Sur les distributions binaires associées à des distributions ordinales

    Mathématiques et Sciences Humaines

    (1980)
  • J. Farkas

    Theorie der einfachen Ungleichungen.

    Journal für die reine und angewandte Mathematik

    (1902)
  • S. Fiorini

    0, 1/2-Cuts and the linear ordering problem: surfaces that define facets

    SIAM J. Discrete Math.

    (2006)
  • P. Gilmore et al.

    A linear programming approach to the cutting-stock problem

    Oper. Res.

    (1961)
  • Cited by (10)

    • Block-insertion-based algorithms for the linear ordering problem

      2020, Computers and Operations Research
      Citation Excerpt :

      Given an n × n matrix C with nonnegative entries, the objective is to determine a simultaneous permutation of the rows and columns of C that maximizes the sum of the super-diagonal entries in the matrix (Martí and Reinelt, 2011; Ceberio et al., 2015; Aledo et al., 2017). The LOP has various applications, such as the triangulation of input–output tables of an economy (Chenery and Watanabe, 1958; Leontief, 1986), determination of ancestry relationships in archaeology (Glover et al., 1974), crossing minimization in graph drawing (Jünger and Mutzel, 1997; Martí et al., 2018; Napoletano et al., 2019), aggregation of individual preferences (Charon and Hudry, 1998; Aledo et al., 2018), ranking in sports tournaments (Martí and Reinelt, 2011), minimizing the first-order rework in project scheduling (Qian et al., 2011; Lin et al., 2015, 2018), and probabilistic models of choice (Smeulders et al., 2018). Due to its wide applicability, the LOP has received considerable attention from the research community.

    • Multinomial models with linear inequality constraints: Overview and improvements of computational methods for Bayesian inference

      2019, Journal of Mathematical Psychology
      Citation Excerpt :

      However, we go further than analyzing simple “toy” models such as the dosage example above and consider models defined by arbitrarily complex linear constraints on multinomial parameters. Analyzing this class of model is known to be computationally challenging, especially for highly complex linear constraints as those defined by random preference models (Smeulders, Davis-Stober, Regenwetter, & Spieksma, 2018) and the axioms of additive conjoint measurement (Karabatsos, 2018). In the following, Section 1.1 highlights the relevance of inequality-constrained multinomial models for testing psychological theories.

    • Probabilistic Choice Induced by Strength of Preference

      2023, Computational Brain and Behavior
    View all citing articles on Scopus
    1

    The author is a post-doctoral fellow the F.R.S.-FNRS.

    View full text