Elsevier

Neural Networks

Volume 144, December 2021, Pages 279-296
Neural Networks

Deep Tobit networks: A novel machine learning approach to microeconometrics

https://doi.org/10.1016/j.neunet.2021.09.003Get rights and content

Abstract

Tobit models (also called as “censored regression models” or classified as “sample selection models” in microeconometrics) have been widely applied to microeconometric problems with censored outcomes. However, due to their linear parametric settings and restrictive normality assumptions, the traditional Tobit models fail to capture the pervading nonlinearities and thus may be inadequate for microeconometric analysis with large-scale datasets. This paper proposes two novel deep neural networks for Tobit problems and explores machine learning approaches in the context of microeconometric modeling. We connect the censored outputs in Tobit models with some deep learning techniques, which are thought to be unrelated to microeconometrics, and use the rectified linear unit activation and a particularly designed network structure to implement the censored output mechanisms and realize the underlying econometric conceptions. The benchmark Tobit-I and Tobit-II models are then reformulated as two carefully designed deep feedforward neural networks named deep Tobit-I network and deep Tobit-II network, respectively. A novel significance testing method is developed based on the proposed networks. Compared with the traditional models, our networks with deep structures can effectively describe the underlying highly nonlinear relationships and achieve considerable improvements in fitting and prediction. With the novel testing method, the proposed networks enable highly accurate and sophisticated econometric analysis with minimal random assumptions. The encouraging numerical experiments on synthetic and realistic datasets demonstrate the utility and advantages of the proposed method.

Introduction

Microeconometrics deals with model-based analysis of economic behavior on an individual level. As representative microeconometric models, Tobit models (also called as “censored regression models” or classified as “sample selection models” in microeconometrics) refer to regression models, in which the observed range of the dependent variable is censored in some way. As two leading examples, Tobit-I and Tobit-II models had been widely used in many practical econometric problems, such as medical expenditures (Neelon et al., 2011), labor supply behavior (Greene & Quester, 1982), internet auctions (Livingston, 2005), and financial portfolios at the household level (Brown et al., 2015).

Although the traditional Tobit models had been proven useful for econometric analysis, they relied on strict assumptions on model settings, thereby resulting in obstacles when large datasets were encountered (Varian, 2014). In the Tobit-I model, the latent variable is set to be linear in regressors. In Tobit-II model, linear classification models, such as Logistic and Probit models, are usually used for the binomial part, and linear or generalized linear regression models are often adopted for the continuous part (Cameron & Trivedi, 2005). However, microeconometric data are essentially generated by the micro decisions of individuals and have high complexity, heterogeneity, and nonlinearity in nature, especially in the context of large datasets (Mullainathan & Spiess, 2017). Thus, the traditional models with linear parametric structures could be inadequate to capture the complex relationship of the underlying dynamics, which may result in serious underfitting problem (Athey & Imbens, 2019). As the outputs of the Tobit models are censored, ordinary least squares (OLS) leads to inconsistent parameter estimation (Goldberger, 1981). Alternative estimation procedures, such as Bayesian method (Van Hasselt, 2011) and maximum likelihood approach (Hasebe, 2013, Hill et al., 2018), were developed to conduct consistent estimation. These estimations and hypothesis testing methods were developed based on strong distributional assumptions that the random errors were normally distributed with a constant variance. However, the random errors are frequently heterogeneous or non-Gaussian in practice, implying that the conventional Tobit models and relevant econometric analysis could be inaccurate for many practical problems (Farrell et al., 2021, Jondeau et al., 2007).

Compared with traditional linear models, machine learning methods including random forests, support vector machines, and deep neural networks (DNNs) can effectively model complex nonlinear relationships, leading to substantial improvements in fitting and out-of-sample prediction. This appealing feature not only facilitates many practical econometric problem solving where accurate predictions are required, such as credit rating, empirical asset pricing and quantitative investment, but also implies that machine learning models can be better choices for approximating the underlying “true models”, especially for large datasets. Based on machine learning models, highly efficient statistical inference methods can be developed to provide promising econometric analysis, such as examination of treatment effects (Thach et al., 2020), causal inference (Knaus et al., 2021, Varian, 2016, Wager and Athey, 2018), significance test (Farrell et al., 2021), factor analysis (Bao et al., 2020, Gu et al., 2021), and financial time series prediction (Li et al., 2015). Moreover, DNNs with flexible functional forms can effectively describe interaction effects. Thus, they also show advantages in econometric problems with high-dimensional data (Gu et al., 2021).

As the currently remarkable machine learning technique, deep learning had been achieved widespread success in a variety of fields, such as image analysis, speech recognition, and text understanding, where complex and large-scale modeling problems arose (LeCun et al., 2015, Schmidhuber, 2015). Meanwhile, the existing literature had showed many interesting similarities between neural networks and conventional econometric models in certain key concepts. For example, deep feedforward network were proven to be able to approximate any complex nonlinear functions, which corresponds to nonlinear fitting or nonparametric regression problems in econometrics (Hornik et al., 1989, White, 1992). The binary classification models in machine learning community were interpreted as limited dependent variable models (Iskhakov et al., 2020). An autoencoder aims to learn a representation (encoding) for a set of data, typically for dimensionality reduction that could be used to design parsimonious econometric model (Wang et al., 2016). Nevertheless, with emerging of large-scale microeconomic data, developing efficient machine learning models for various microeconometric problems is still an open and interesting topic.

In this study, we introduce deep learning techniques to traditional Tobit-I and Tobit-II models to relax the linearity and normality assumptions. The Tobit models are usually referred to as “sample selection models” in econometric community. We emphasize that the “sample selection” has its own definition in microeconometrics and is defined as a censored output mechanism determined by specific Tobit problems (will be specified in Sections 2.1 Benchmark Tobit models, 2.2 Deep Tobit networks of this paper). A main challenge of this study is developing effective techniques to realize the censored output mechanisms in Tobit problems from machine learning perspective. Motivated by the similarity between the rectified linear unit (ReLU) activation and the censored output mechanism of the Tobit-I model, we propose to use ReLU activation to reformulate the censored output mechanism of the Tobit-I model. Moreover, we employ a multi-layer feedforward neural network, which is distribution- and parametric-form-free, to reformulate the linear regression part of the Tobit-I model. By doing so, the newly constructed deep Tobit-I network (DTN-I) can effectively uncover the complex nonlinear relationships of interest and simultaneously maintains the censored outcome in Tobit-I problems. Likewise, inspired by the similarity between the output gating mechanism of long short-term memory network (LSTM) and the output mechanism of the Tobit-II model, we propose to use the output gating mechanism of LSTM to reformulate the output mechanism of the Tobit-II model and adopt multi-layer feedforward neural networks to reformulate the linear regressions in the Tobit-II model. The proposed deep Tobit-II network (DTN-II) can accommodate nonlinearity and guarantees that the outputs are consistent with the data characteristics and econometric conceptions described by Tobit-II problems. Notably, the proposed models do not involve any distributional assumption. Thus, traditional hypothesis testing methods become inapplicable, which brings another challenge of this study. Based on the paired sample t-test, we construct a novel significance testing method to address the problem. Compared with traditional Tobit-I and Tobit-II models, our method has two major advantages. First, the proposed network models with deep structures possess high capability of uncovering complex nonlinear relationships between variables and achieve considerable improvement in fitting and prediction performance. Second, with the newly developed significance testing method, the proposed network models can provide more reliable econometric analysis with minimal assumptions. The encouraging numerical experiments on synthetic and realistic datasets demonstrate the utility and advantages of the proposed model.

The main contributions of this article are summarized as follows:

(1) We propose two novel DNNs to learn microeconometric patterns using machine learning approaches. The proposed DTN-I and DTN-II are particularly advantageous over the traditional Tobit-I and Tobit-II models in capturing complex nonlinearity. To the best of our knowledge, this study is the first to introduce machine learning techniques to the Tobit models, which are commonly used to model censored outcomes in microeconometrics.

(2) Based on the proposed network models, a novel significance testing method is developed to provide highly reliable and promising econometric analysis.

(3) Through the demonstration of reformulating econometric models as neural networks, this study broadens insights into the construction of microeconometric models from machine learning perspective.

The rest of this article is organized as follows. Section 2 presents the methodology, including the proposed network models and significance testing method. The benchmark Tobit-I and Tobit-II models are briefly reviewed in the beginning of this section. Section 3 conducts extensive numerical experiments on synthetic and realistic datasets to assess the empirical performance of the proposed method. A comparison between the proposed and existing methods as well as an ablation study is also provided. Section 4 concludes the paper with discussions.

Section snippets

Benchmark Tobit models

In this section, we briefly review the traditional Tobit-I and Tobit-II models (Cameron & Trivedi, 2005), and present the limitations of traditional methods.

Results

This section provides numerical examples to demonstrate the effectiveness of our method based on artificial and realistic data. The proposed DTN-I and DTN-II are compared with traditional Tobit models in terms of prediction accuracy and the capacity to learn nonlinear microeconometric problems.

Discussion

Although machine learning methods including deep learning have achieved great success in many fields, deploying machine learning techniques in the context of traditional microeconometric problems remains an open and important topic. In this paper, we propose two novel DNN-based microeconometric models by combining deep learning and traditional microeconometric theory. The proposed DTNs can achieve robust and accurate prediction results in large datasets and provide satisfactory significance

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by National Social Science Foundation of China under Project 19BTJ025, and General Research Fund grants 14301918 and 14302519 from the Research Grant Council of the Hong Kong Special Administrative Region .

References (49)

  • SchmidhuberJ.

    Deep learning in neural networks: An overview

    Neural Networks

    (2015)
  • Van HasseltM.

    BayesIan inference in a sample selection model

    Journal of Econometrics

    (2011)
  • WangY. et al.

    Auto-encoder based dimensionality reduction

    Neurocomputing

    (2016)
  • AtheyS. et al.

    Machine learning methods that economists should know about

    Annual Review of Economics

    (2019)
  • BaoY. et al.

    Detecting accounting fraud in publicly traded US firms using a machine learning approach

    Journal of Accounting Research

    (2020)
  • CameronA.C. et al.

    Microeconometrics: methods and applications

    (2005)
  • DuchiJ. et al.

    Adaptive subgradient methods for online learning and stochastic optimization

    Journal of Machine Learning Research

    (2011)
  • FarrellM.H. et al.

    Deep neural networks for estimation and inference

    Econometrica

    (2021)
  • GersF.A. et al.

    Learning to forget: Continual prediction with LSTM

    Neural Computation

    (2000)
  • GoodfellowI. et al.

    Deep learning

    (2016)
  • GreeneW.H. et al.

    Divorce risk and wives labor supply behavior

    Social Science Quarterly

    (1982)
  • HasebeT.

    Copula-based maximum-likelihood estimation of sample-selection models

    The Stata Journal

    (2013)
  • HeH. et al.

    Imbalanced learning: foundations, algorithms, and applications

    (2013)
  • HeckmanJ.J.

    Sample selection bias as a specification error

    Econometrica

    (1979)
  • Cited by (9)

    • A deep learning approach to censored regression

      2024, Pattern Analysis and Applications
    View all citing articles on Scopus
    View full text