Multitask learning deep neural networks to combine revealed and stated preference data

https://doi.org/10.1016/j.jocm.2020.100236Get rights and content

Abstract

It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze individual choices. While the nested logit (NL) model is the classical way to address the question, this study presents multitask learning deep neural networks (MTLDNNs) as an alternative framework, and discusses its theoretical foundation, empirical performance, and behavioral intuition. We first demonstrate that the MTLDNNs are theoretically more general than the NL models because of MTLDNNs’ automatic feature learning, flexible regularizations, and diverse architectures. By analyzing the adoption of autonomous vehicles (AVs), we illustrate that the MTLDNNs outperform the NL models in terms of prediction accuracy but underperform in terms of cross-entropy losses. To interpret the MTLDNNs, we compute the elasticities and visualize the relationship between choice probabilities and input variables. The MTLDNNs reveal that AVs mainly substitute driving and ride hailing, and that the variables specific to AVs are more important than the socio-economic variables in determining AV adoption. Overall, this work demonstrates that MTLDNNs are theoretically appealing in leveraging the information shared by RP and SP and capable of revealing meaningful behavioral patterns, although its performance gain over the classical NL model is still limited. To improve upon this work, future studies can investigate the inconsistency between prediction accuracy and cross-entropy losses, novel MTLDNN architectures, regularization design for the RP-SP question, MTLDNN applications to other choice scenarios, and deeper theoretical connections between choice models and the MTLDNN framework.

Introduction

For decades, researchers have been combining revealed preference (RP) and stated preference (SP) data to analyze individual behavior, owing to their complementary properties. RP data are thought to have stronger external validity but often lack the variation in attributes or alternatives, while SP data often incorporate new attributes or alternatives but lack strong external validity. As a classical method, the nested logit (NL) model has been commonly used to combine RP and SP by assigning their alternatives to two nests with different utility scale factors (Hensher and Bradley, 1993; Bradley and Daly, 1997; Ben-Akiva and Morikawa, 1990; Ben-Akivaet al., 1994).1 However, in the NL model, researchers need to analyze RP and SP by handcrafting the model structure, which can be too restrictive to capture the complex data generating process. This handcrafted feature engineering is different from the mechanism in deep neural networks (DNNs) (LeCun et al., 2015; Bengio et al., 2013; Collobert and Weston, 2008), which can automatically learn generalizable features to achieve outstanding predictive performance across disciplines (Fernández-Delgadoet al., 2014; Krizhevsky et al., 2012; LeCun et al., 2015). The recent innovations in DNNs prompt us to investigate the possibility of using a DNN framework to address the classical problem of combining RP and SP, as an alternative to the traditional NL method.

This study presents a framework of multitask learning deep neural networks (MTLDNNs) to jointly analyze RP and SP, demonstrating MTLDNNs' theoretical flexibility, empirical performance, and behavioral intuition. A MTLDNN architecture starts with shared layers capturing the similarities between RP and SP, and ends with task-specific layers capturing their differences (Fig. 1) (Caruana, 1997). We first demonstrate that MTLDNNs are theoretically more general than NL owing to their automatic feature learning, soft constraints, and diverse architectures. Then we apply the MTLDNN framework to a data set collected in Singapore, which was designed to analyze the adoption of autonomous vehicles (AVs). In the empirical experiments, we compare the MTLDNNs to two NL benchmarks using prediction accuracy and cross-entropy loss.2 To understand the determinants of AV adoption, we visualize the relationship between choice probabilities and input variables and compute the elasticity values using MTLDNNs’ gradients information (Baehrenset al., 2010; Simonyan et al., 2013). Overall, our analysis demonstrates that the MTLDNNs are theoretically appealing in leveraging the shared information between RP and SP, and are capable of revealing meaningful behavioral patterns, although the gain in empirical performance, particularly measured by cross-entropy losses, is still limited.

This study contributes to the choice modeling community by being the first to present the MTLDNN framework in the important context of combining RP and SP. Future studies can investigate deeper theoretical and empirical questions revolving around this topic. Particularly, future researchers should investigate the inconsistency between prediction accuracy and cross-entropy losses, because the two metrics represent the different perspectives from machine learning and classical choice modeling. Researchers can also investigate the classical theoretical question (e.g. modeling the structure of random utility terms of RP and SP) under this MTLDNN framework and improve the empirical performance of this study by using advanced MTLDNN architectures (Long and Wang, 2015; Hashimotoet al., 2016; Misraet al., 2016; Ruder12et al., 2017). Future studies can also apply the MTLDNN framework to other choice scenarios, such as jointly analyzing car ownership and travel mode choice (Train, 1980; Zegras, 2010), activity patterns and trip chain choices (Kitamuraet al., 1992; Golob and McNally, 1997), and many others that are traditionally analyzed by structural equation models (SEM). For future researchers to replicate and improve upon our work, we have uploaded the project to Github.3

This paper is organized as following. Section 2 reviews the MTLDNN and NL models. Section 3 presents the MTLDNN framework and compares its theoretical properties to the NL models. Section 4 presents data and methods, and Section 5 analyzes model performance and presents the economic information in MTLDNNs. Section 6 concludes our findings and discusses future research directions.

Section snippets

Literature review

For travel demand analysis, RP and SP data are important but associated with different sources of problems. The RP data can have limited coverage of values, high correlation between attributes, and poor quality of background information (Ben-Akivaet al., 1994), although it typically has better external validity. In the SP data, respondents could fail to provide valid answers because of their sensitivity to survey formats, unrealistic hypothetical scenarios (Small and Winston, 1998), or even

Multitask learning deep neural network for RP and SP

Let xr,i,xs,tRd denote the input variables for RP and SP respectively, where r and s stand for RP and SP, i{1,2,,Nr} and t{1,2,,Ns} are the indices of RP and SP observations, and d represents the input dimension. The output choices of RP and SP are denoted by yr,i and ys,t; yr,i{0,1}Kr and ys,t{0,1}Ks; Kr and Ks are the dimensions of the outputs. In our case, SP has more alternatives than RP since SP includes a new product that is not available in the existing market (Ks>Kr). Both yr,i

Data collection

A survey was conducted through Qualtrics.com in July 2017 to analyze the mode choice preferences to autonomous vehicles (AVs) in Singapore. The survey consisted of three sections: RP, SP, and respondents’ demographics. In the first part (RP), the respondents reported the zip codes of the origin and destination (OD) of their most recent trip with a specific trip purpose, which was randomly drawn from commuting, shopping, or recreation, along with the travel mode choice of the trip that was

Model performance

Table 2 summarizes the prediction accuracy and cross-entropy losses of MTLDNN (Top 1), MTLDNN ensemble over top 10 models (MTLDNN-E), NL with parameter constraints (NL-C), and NL with no parameter constraints (NL-NC). In Table 2, Panel 1 reports the joint prediction accuracy for RP and SP, individual RP and SP performance in the testing and training sets; Panel 2 reports the cross-entropy loss for the joint RP and SP data set.

The MTLDNNs outperform the NL models in terms of prediction accuracy,

Conclusions and discussions

This study introduces the MTLDNN framework to combine RP and SP for demand analysis. It is fueled by the practical importance of combining RP and SP for prediction and the theoretical interest of using deep learning to analyze individual demand. This study investigates the theoretical, empirical, and behavioral dimensions of tackling the RP-SP problem under the MTLDNN framework, yielding the following findings.

Theoretically, it is feasible and appealing to combine RP and SP data using the

Author statement

All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. Furthermore, each author certifies that this material or similar material has not been and will not be submitted to or published in any other publication.

Author contributions

S.W. conceived of the presented idea; S.W. developed the theory and reviewed previous studies; S.W. and Q.W. designed and conducted the experiments; S.W. drafted the manuscripts; Q.W. and J.Z. provided comments; J.Z. supervised this work. All authors discussed the results and contributed to the final manuscript.

CRediT authorship contribution statement

Shenhao Wang: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Project administration. Qingyi Wang: Software, Data curation, Visualization, Investigation. Jinhua Zhao: Supervision, Funding acquisition, Resources.

Declaration of competing interest

All the authors have no conflict of interest to declare.

Acknowledgements

We thank Singapore-MIT Alliance for Research and Technology (SMART) for partially funding this research. We thank Mary Rose Fissinger for her careful proofreading.

References (69)

  • Xin Ye et al.

    An exploration of the relationship between mode choice and complexity of trip chaining patterns

    Transp. Res. Part B Methodol.

    (2007)
  • Martin Anthony et al.

    Neural Network Learning: Theoretical Foundations

    (2009)
  • Andreas Argyriou et al.

    Multi-task feature learning

  • David Baehrens

    How to explain individual classification decisions

    J. Mach. Learn. Res.

    (2010)
  • Peter L. Bartlett et al.

    Rademacher and Gaussian complexities: risk bounds and structural results

    J. Mach. Learn. Res.

    (2002)
  • Peter L. Bartlett

    Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks

  • Moshe Ben-Akiva

    Combining revealed and stated preferences data

    Market. Lett.

    (1994)
  • Yoshua Bengio et al.

    Representation learning: a review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • Yves Bentz et al.

    Neural networks and the multinomial logit for brand choice modelling: a hybrid approach

    J. Forecast.

    (2000)
  • Mark A. Bradley et al.

    Estimation of logit choice models using mixed stated preference and revealed preference information

  • Rich Caruana

    Multitask learning

    Mach. Learn.

    (1997)
  • Jonathan D. Cohen

    Measuring Time Preferences

    (2016)
  • Ronan Collobert et al.

    A unified architecture for natural language processing: deep neural networks with multitask learning

  • Theodoros Evgeniou et al.

    Learning multiple tasks with kernel methods

    J. Mach. Learn. Res.

    (2005)
  • Manuel Fernández-Delgado

    Do we need hundreds of classifiers to solve real world classification problems

    J. Mach. Learn. Res.

    (2014)
  • Matthias Feurer et al.

    Chapter 1. Hyperparameter Optimization

    (2018)
  • Thomas F. Golob et al.

    A vehicle use forecasting model based on revealed and stated vehicle type choice and utilisation data

    J. Transport Econ. Pol.

    (1997)
  • Noah Golowich et al.

    Size-independent sample complexity of neural networks

  • Kazuma Hashimoto

    A joint many-task model: growing a neural network for multiple nlp tasks

  • Jerry Hausman

    Mismeasured variables in econometric analysis: problems from the right and problems from the left

    J. Econ. Perspect.

    (2001)
  • David A. Hensher et al.

    Using stated response choice data to enrich revealed preference discrete choice models

    Market. Lett.

    (1993)
  • Stephane Hess et al.

    Should reference alternatives in pivot design SC surveys be treated differently?

    Environ. Resour. Econ.

    (2009)
  • Geoffrey Hinton et al.

    Distilling the knowledge in a neural network

  • Laurent Jacob et al.

    Clustered multi-task learning: a convex formulation

  • Cited by (23)

    View all citing articles on Scopus
    View full text