1932

Abstract

We discuss the relevance of the recent machine learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods, and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the ML literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, and matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, including causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-economics-080217-053433
2019-08-02
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/economics/11/1/annurev-economics-080217-053433.html?itemId=/content/journals/10.1146/annurev-economics-080217-053433&mimeType=html&fmt=ahah

Literature Cited

  1. Abadie A, Cattaneo MD 2018. Econometric methods for program evaluation. Annu. Rev. Econ. 10:465–503
    [Google Scholar]
  2. Abadie A, Diamond A, Hainmueller J 2010. Synthetic control methods for comparative case studies: estimating the effect of California's tobacco control program. J. Am. Stat. Assoc. 105:493–505
    [Google Scholar]
  3. Abadie A, Diamond A, Hainmueller J 2015. Comparative politics and the synthetic control method. Am. J. Political Sci. 59:495–510
    [Google Scholar]
  4. Abadie A, Imbens GW 2011. Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29:1–11
    [Google Scholar]
  5. Alpaydin E 2009. Introduction to Machine Learning Cambridge, MA: MIT Press
  6. Angrist JD, Pischke JS 2008. Mostly Harmless Econometrics: An Empiricist's Companion Princeton, NJ: Princeton Univ. Press
  7. Arjovsky M, Bottou L 2017. Towards principled methods for training generative adversarial networks. arXiv:1701.04862 [stat.ML]
    [Google Scholar]
  8. Arora S, Li Y, Liang Y, Ma T 2016. RAND-WALK: a latent variable model approach to word embeddings. Trans. Assoc. Comput. Linguist. 4:385–99
    [Google Scholar]
  9. Athey S 2017. Beyond prediction: using big data for policy problems. Science 355:483–85
    [Google Scholar]
  10. Athey S 2019. The impact of machine learning on economics. The Economics of Artificial Intelligence: An Agenda AK Agrawal, J Gans, A Goldfarb Chicago: Univ. Chicago Press In press
    [Google Scholar]
  11. Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K 2017a. Matrix completion methods for causal panel data models. arXiv:1710.10251 [math.ST]
    [Google Scholar]
  12. Athey S, Bayati M, Imbens G, Zhaonan Q 2019. Ensemble methods for causal effects in panel data settings NBER Work. Pap. 25675
  13. Athey S, Blei D, Donnelly R, Ruiz F 2017b. Counterfactual inference for consumer choice across many product categories. AEA Pap. Proc. 108:64–67
    [Google Scholar]
  14. Athey S, Imbens G 2016. Recursive partitioning for heterogeneous causal effects. PNAS 113:7353–60
    [Google Scholar]
  15. Athey S, Imbens G, Wager S 2016a. Efficient inference of average treatment effects in high dimensions via approximate residual balancing. arXiv:1604.07125 [math.ST]
    [Google Scholar]
  16. Athey S, Imbens GW 2017a. The econometrics of randomized experiments. Handbook of Economic Field Experiments 1 E Duflo, A Banerjee73–140 Amsterdam: Elsevier
    [Google Scholar]
  17. Athey S, Imbens GW 2017b. The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31:3–32
    [Google Scholar]
  18. Athey S, Mobius MM, Pál J 2017c. The impact of aggregators on internet news consumption Unpublished manuscript Grad. School Bus., Stanford Univ. Stanford, CA:
  19. Athey S, Tibshirani J, Wager S 2016b. Generalized random forests. arXiv:1610.01271 [stat.ME]
    [Google Scholar]
  20. Athey S, Wager S 2017. Efficient policy learning. arXiv:1702.02896 [math.ST]
    [Google Scholar]
  21. Bai J 2003. Inferential theory for factor models of large dimensions. Econometrica 71:135–71
    [Google Scholar]
  22. Bai J, Ng S 2002. Determining the number of factors in approximate factor models. Econometrica 70:191–221
    [Google Scholar]
  23. Bai J, Ng S 2017. Principal components and regularized estimation of factor models. arXiv:1708.08137 [stat.ME]
    [Google Scholar]
  24. Bamler R, Mandt S 2017. Dynamic word embeddings via skip-gram filtering. Proceedings of the 34th International Conference on Machine Learning380–89 La Jolla, CA: Int. Mach. Learn. Soc.
    [Google Scholar]
  25. Barkan O 2016. Bayesian neural word embedding. arXiv:1603.06571 [math.ST]
    [Google Scholar]
  26. Bastani H, Bayati M 2015. Online decision-making with high-dimensional covariates Work. Pap. Univ. Penn./Stanford Grad. School Bus. Philadelphia/Stanford, CA:
  27. Bell RM, Koren Y 2007. Lessons from the Netflix prize challenge. ACM SIGKDD Explor. Newsl. 9:75–79
    [Google Scholar]
  28. Belloni A, Chernozhukov V, Hansen C 2014. High-dimensional methods and inference on structural and treatment effects. J. Econ. Perspect. 28:29–50
    [Google Scholar]
  29. Bengio Y, Ducharme R, Vincent P, Janvin C 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3:1137–55
    [Google Scholar]
  30. Bengio Y, Schwenk H, Senécal JS, Morin F, Gauvain JL 2006. Neural probabilistic language models. Innovations in Machine Learning: Theory and Applications DE Holmes137–86 Berlin: Springer
    [Google Scholar]
  31. Bennett J, Lanning S 2007. The Netflix prize. Proceedings of KDD Cup and Workshop 2007 New York: ACM
    [Google Scholar]
  32. Bertsimas D, King A, Mazumder R 2016. Best subset selection via a modern optimization lens. Ann. Stat. 44:813–52
    [Google Scholar]
  33. Bickel P, Klaassen C, Ritov Y, Wellner J 1998. Efficient and Adaptive Estimation for Semiparametric Models Berlin: Springer
  34. Bierens HJ 1987. Kernel estimators of regression functions. Advances in Econometrics: Fifth World Congress 1 TF Bewley99–144 Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  35. Blei DM, Lafferty JD 2009. Topic models. Text Mining: Classification, Clustering, and Applications A Srivastava, M Sahami101–24 Boca Raton, FL: CRC Press
    [Google Scholar]
  36. Bottou L 1998. Online learning and stochastic approximations. On-Line Learning in Neural Networks D Saad9–42 New York: ACM
    [Google Scholar]
  37. Bottou L 2012. Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade G Montavon, G Orr, K-R Müller421–36 Berlin: Springer
    [Google Scholar]
  38. Breiman L 1993. Better subset selection using the non-negative garotte Tech. Rep. Univ. Calif. Berkeley:
  39. Breiman L 1996. Bagging predictors. Mach. Learn. 24:123–40
    [Google Scholar]
  40. Breiman L 2001a. Random forests. Mach. Learn. 45:5–32
    [Google Scholar]
  41. Breiman L 2001b. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16:199–231
    [Google Scholar]
  42. Breiman L, Friedman J, Stone CJ, Olshen RA 1984. Classification and Regression Trees Boca Raton, FL: CRC Press
  43. Burkov A 2019. The Hundred-Page Machine Learning Book Quebec City, Can.: Andriy Burkov
  44. Candès E, Tao T 2007. The Dantzig selector: statistical estimation when is much larger than . Ann. Stat. 35:2313–51
    [Google Scholar]
  45. Candès EJ, Recht B 2009. Exact matrix completion via convex optimization. Found. Comput. Math. 9:717
    [Google Scholar]
  46. Chamberlain G 2000. Econometrics and decision theory. J. Econom. 95:255–83
    [Google Scholar]
  47. Chen X 2007. Large sample sieve estimation of semi-nonparametric models. Handbook of Econometrics 6B JJ Heckman, EE Learner5549–632 Amsterdam: Elsevier
    [Google Scholar]
  48. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C et al. 2016a. Double machine learning for treatment and causal parameters Tech. Rep., Cent. Microdata Methods Pract., Inst. Fiscal Stud., London
  49. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C et al. 2018a. Double/debiased machine learning for treatment and structural parameters. Econom. J. 21:C1–68
    [Google Scholar]
  50. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W 2017. Double/debiased/Neyman machine learning of treatment effects. Am. Econ. Rev. 107:261–65
    [Google Scholar]
  51. Chernozhukov V, Demirer M, Duflo E, Fernandez-Val I 2018b. Generic machine learning inference on heterogenous treatment effects in randomized experiments NBER Work. Pap. 24678
  52. Chernozhukov V, Escanciano JC, Ichimura H, Newey WK 2016b. Locally robust semiparametric estimation. arXiv:1608.00033 [math.ST]
    [Google Scholar]
  53. Chernozhukov V, Newey W, Robins J 2018c. Double/de-biased machine learning using regularized Riesz representers. arXiv:1802.08667 [stat.ML]
    [Google Scholar]
  54. Chipman HA, George EI, McCulloch RE 2010. Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4:266–98
    [Google Scholar]
  55. Cortes C, Vapnik V 1995. Support-vector networks. Mach. Learn. 20:273–97
    [Google Scholar]
  56. Dietterich TG 2000. Ensemble methods in machine learning. Multiple Classifier Systems: First International Workshop, Cagliari, Italy, June 21–231–15 Berlin: Springer
    [Google Scholar]
  57. Dimakopoulou M, Athey S, Imbens G 2017. Estimation considerations in contextual bandits. arXiv:1711.07077 [stat.ML]
    [Google Scholar]
  58. Dimakopoulou M, Zhou Z, Athey S, Imbens G 2018. Balanced linear contextual bandits. arXiv:1812.06227 [cs.LG]
    [Google Scholar]
  59. Doudchenko N, Imbens GW 2016. Balancing, regression, difference-in-differences and synthetic control methods: a synthesis NBER Work. Pap. 22791
  60. Dudik M, Erhan D, Langford J, Li L 2014. Doubly robust policy evaluation and optimization. Stat. Sci. 29:485–511
    [Google Scholar]
  61. Dudik M, Langford J, Li L 2011. Doubly robust policy evaluation and learning. Proceedings of the 28th International Conference on Machine Learning1097–104 La Jolla, CA: Int. Mach. Learn. Soc.
    [Google Scholar]
  62. Efron B, Hastie T 2016. Computer Age Statistical Inference 5 Cambridge, UK: Cambridge Univ. Press
  63. Efron B, Hastie T, Johnstone I, Tibshirani R 2004. Least angle regression. Ann. Stat. 32:407–99
    [Google Scholar]
  64. Farrell MH, Liang T, Misra S 2018. Deep neural networks for estimation and inference: application to causal effects and other semiparametric estimands. arXiv:1809.09953 [econ.EM]
    [Google Scholar]
  65. Firth JR 1957. A synopsis of linguistic theory 1930–1955. Studies in Linguistic Analysis (Special Volume of the Philological Society) JR Firth1–32 Oxford, UK: Blackwell
    [Google Scholar]
  66. Friedberg R, Tibshirani J, Athey S, Wager S 2018. Local linear forests. arXiv:1807.11408 [stat.ML]
    [Google Scholar]
  67. Friedman JH 2002. Stochastic gradient boosting. Comput. Stat. Data Anal. 38:367–78
    [Google Scholar]
  68. Gentzkow M, Kelly BT, Taddy M 2017. Text as data NBER Work. Pap. 23276
  69. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2672–80 San Diego, CA: Neural Inf. Process. Syst. Found.
    [Google Scholar]
  70. Gopalan P, Hofman J, Blei DM 2015. Scalable recommendation with hierarchical Poisson factorization. Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, Amsterdam, Neth., July 12–16 Amsterdam: Assoc. Uncertain. Artif. Intell
    [Google Scholar]
  71. Green DP, Kern HL 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opin. Q. 76:491–511
    [Google Scholar]
  72. Greene WH 2000. Econometric Analysis Upper Saddle River, NJ: Prentice Hall. 4th ed
  73. Harris ZS 1954. Distributional structure. Word 10:146–62
    [Google Scholar]
  74. Hartford J, Lewis G, Taddy M 2016. Counterfactual prediction with deep instrumental variables networks. arXiv:1612.09596 [stat.AP]
    [Google Scholar]
  75. Hartigan JA, Wong MA 1979. Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C 28:100–8
    [Google Scholar]
  76. Hastie T, Tibshirani R, Friedman J 2009. The Elements of Statistical Learning Berlin: Springer
  77. Hastie T, Tibshirani R, Tibshirani RJ 2017. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv:1707.08692 [stat.ME]
    [Google Scholar]
  78. Hastie T, Tibshirani R, Wainwright M 2015. Statistical Learning with Sparsity: The Lasso and Generalizations New York: CRC Press
  79. Hill JL 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20:217–40
    [Google Scholar]
  80. Hirano K, Porter JR 2009. Asymptotics for statistical treatment rules. Econometrica 77:1683–701
    [Google Scholar]
  81. Hoerl AE, Kennard RW 1970. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
    [Google Scholar]
  82. Holland PW 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81:945–60
    [Google Scholar]
  83. Hornik K, Stinchcombe M, White H 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2:359–66
    [Google Scholar]
  84. Imai K, Ratkovic M 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7:443–70
    [Google Scholar]
  85. Imbens G, Wooldridge J 2009. Recent developments in the econometrics of program evaluation. J. Econ. Lit. 47:5–86
    [Google Scholar]
  86. Imbens GW, Lemieux T 2008. Regression discontinuity designs: a guide to practice. J. Econom. 142:615–35
    [Google Scholar]
  87. Imbens GW, Rubin DB 2015. Causal Inference in Statistics, Social, and Biomedical Sciences Cambridge, UK: Cambridge Univ. Press
  88. Jacobs B, Donkers B, Fok D 2014. Product Recommendations Based on Latent Purchase Motivations Rotterdam, Neth.: ERIM
  89. Jiang N, Li L 2016. Doubly robust off-policy value evaluation for reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning652–61 La Jolla, CA: Int. Mach. Learn. Soc.
    [Google Scholar]
  90. Kallus N 2017. Balanced policy evaluation and learning. arXiv:1705.07384 [stat.ML]
    [Google Scholar]
  91. Keane MP 2013. Panel data discrete choice models of consumer demand. The Oxford Handbook of Panel Data BH Baltagi54–102 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  92. Kitagawa T, Tetenov A 2015. Who should be treated? Empirical welfare maximization methods for treatment choice Tech. Rep., Cent. Microdata Methods Pract., Inst. Fiscal Stud., London
  93. Knox SW 2018. Machine Learning: A Concise Introduction Hoboken, NJ: Wiley
  94. Krizhevsky A, Sutskever I, Hinton GE 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger1097–105 San Diego, CA: Neural Inf. Process. Syst. Found.
    [Google Scholar]
  95. Künzel S, Sekhon J, Bickel P, Yu B 2017. Meta-learners for estimating heterogeneous treatment effects using machine learning. arXiv:1706.03461 [math.ST]
    [Google Scholar]
  96. Lai TL, Robbins H 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6:4–22
    [Google Scholar]
  97. LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
    [Google Scholar]
  98. Levy O, Goldberg Y 2014. Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2177–85 San Diego, CA: Neural Inf. Process. Syst. Found.
    [Google Scholar]
  99. Li L, Chen S, Kleban J, Gupta A 2014. Counterfactual estimation and optimization of click metrics for search engines: a case study. Proceedings of the 24th International Conference on the World Wide Web929–34 New York: ACM
    [Google Scholar]
  100. Li L, Chu W, Langford J, Moon T, Wang X 2012. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. Proceedings of 4th ACM International Conference on Web Search and Data Mining297–306 New York: ACM
    [Google Scholar]
  101. Matzkin RL 1994. Restrictions of economic theory in nonparametric methods. Handbook of Econometrics 4 R Engle, D McFadden2523–58 Amsterdam: Elsevier
    [Google Scholar]
  102. Matzkin RL 2007. Nonparametric identification. Handbook of Econometrics 6B J Heckman, E Learner5307–68 Amsterdam: Elsevier
    [Google Scholar]
  103. Mazumder R, Hastie T, Tibshirani R 2010. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11:2287–322
    [Google Scholar]
  104. Meinshausen N 2007. Relaxed lasso. Comput. Stat. Data Anal. 52:374–93
    [Google Scholar]
  105. Mikolov T, Chen K, Corrado GS, Dean J 2013a. Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
    [Google Scholar]
  106. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J 2013b. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger3111–19 San Diego, CA: Neural Inf. Process. Syst. Found.
    [Google Scholar]
  107. Mikolov T, Yih W, Zweig G 2013c. Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies746–51 New York: Assoc. Comput. Linguist.
    [Google Scholar]
  108. Miller A 2002. Subset Selection in Regression New York: CRC Press
  109. Mnih A, Hinton GE 2007. Three new graphical models for statistical language modelling. International Conference on Machine Learning641–48 La Jolla, CA: Int. Mach. Learn. Soc.
    [Google Scholar]
  110. Mnih A, Kavukcuoglu K 2013. Learning word embeddings efficiently with noise-contrastive estimation. Advances in Neural Information Processing Systems 26 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2265–73 San Diego, CA: Neural Inf. Process. Syst. Found.
    [Google Scholar]
  111. Mnih A, Teh YW 2012. A fast and simple algorithm for training neural probabilistic language models. Proceedings of the 29th International Conference on Machine Learning419–26 La Jolla, CA: Int. Mach. Learn. Soc.
    [Google Scholar]
  112. Morris CN 1983. Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc. 78:47–55
    [Google Scholar]
  113. Mullainathan S, Spiess J 2017. Machine learning: an applied econometric approach. J. Econ. Perspect. 31:87–106
    [Google Scholar]
  114. Nie X, Wager S 2019. Quasi-oracle estimation of heterogeneous treatment effects. arXiv:1712.04912 [stat.ML]
    [Google Scholar]
  115. Pennington J, Socher R, Manning CD 2014. GloVe: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing1532–43 New York: Assoc. Comput. Linguist.
    [Google Scholar]
  116. Robins J, Rotnitzky A 1995. Semiparametric efficiency in multivariate regression models with missing data. J. Am. Stat. Assoc. 90:122–29
    [Google Scholar]
  117. Rosenbaum PR, Rubin DB 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
    [Google Scholar]
  118. Ruiz FJ, Athey S, Blei DM 2017. SHOPPER: a probabilistic model of consumer choice with substitutes and complements. arXiv:1711.03560 [stat.ML]
    [Google Scholar]
  119. Rumelhart DE, Hinton GE, Williams RJ 1986. Learning representations by back-propagating errors. Nature 323:533–36
    [Google Scholar]
  120. Schapire RE, Freund Y 2012. Boosting: Foundations and Algorithms Cambridge, MA: MIT Press
  121. Scholkopf B, Smola AJ 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Cambridge, MA: MIT Press
  122. Scott SL 2010. A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind. 26:639–58
    [Google Scholar]
  123. Semenova V, Goldman M, Chernozhukov V, Taddy M 2018. Orthogonal ML for demand estimation: high dimensional causal inference in dynamic panels. arXiv:1712.09988 [stat.ML]
    [Google Scholar]
  124. Strehl A, Langford J, Li L, Kakade S 2010. Learning from logged implicit exploration data. Advances in Neural Information Processing Systems 23 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2217–25 San Diego, CA: Neural Inf. Process. Syst. Found.
    [Google Scholar]
  125. Sutton RS, Barto AG 1998. Reinforcement Learning: An Introduction Cambridge, MA: MIT Press
  126. Swaminathan A, Joachims T 2015. Batch learning from logged bandit feedback through counterfactual risk minimization. J. Mach. Learn. Res. 16:1731–55
    [Google Scholar]
  127. Thomas P, Brunskill E 2016. Data-efficient off-policy policy evaluation for reinforcement learning. Proceedings of the International Conference on Machine Learning2139–48 La Jolla, CA: Int. Mach. Learn. Soc.
    [Google Scholar]
  128. Thompson WR 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25:285–94
    [Google Scholar]
  129. Tibshirani R 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
    [Google Scholar]
  130. Tibshirani R, Hastie T 1987. Local likelihood estimation. J. Am. Stat. Assoc. 82:559–67
    [Google Scholar]
  131. van der Laan MJ, Rubin D 2006. Targeted maximum likelihood learning. Int. J. Biostat. 2:134–56
    [Google Scholar]
  132. Van der Vaart AW 2000. Asymptotic Statistics Cambridge, UK: Cambridge Univ. Press
  133. Vapnik V 2013. The Nature of Statistical Learning Theory Berlin: Springer
  134. Varian HR 2014. Big data: new tricks for econometrics. J. Econ. Perspect. 28:3–28
    [Google Scholar]
  135. Vilnis L, McCallum A 2015. Word representations via Gaussian embedding. arXiv:1412.6623 [cs.CL]
    [Google Scholar]
  136. Wager S, Athey S 2017. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113:1228–42
    [Google Scholar]
  137. Wan M, Wang D, Goldman M, Taddy M, Rao J et al. 2017. Modeling consumer preferences and price sensitivities from large-scale grocery shopping transaction logs. Proceedings of the 26th International Conference on the World Wide Web1103–12 New York: ACM
    [Google Scholar]
  138. White H 1992. Artificial Neural Networks: Approximation and Learning Theory Oxford, UK: Blackwell
  139. Wooldridge JM 2010. Econometric Analysis of Cross Section and Panel Data Cambridge, MA: MIT Press
  140. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q et al. 2008. Top 10 algorithms in data mining. Knowl. Inform. Syst. 14:1–37
    [Google Scholar]
  141. Zeileis A, Hothorn T, Hornik K 2008. Model-based recursive partitioning. J. Comput. Graph. Stat. 17:492–514
    [Google Scholar]
  142. Zhou Z, Athey S, Wager S 2018. Offline multi-action policy learning: generalization and optimization. arXiv:1810.04778 [stat.ML]
    [Google Scholar]
  143. Zou H, Hastie T 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67:301–20
    [Google Scholar]
  144. Zubizarreta JR 2015. Stable weights that balance covariates for estimation with incomplete outcome data. J. Am. Stat. Assoc. 110:910–22
    [Google Scholar]
/content/journals/10.1146/annurev-economics-080217-053433
Loading
/content/journals/10.1146/annurev-economics-080217-053433
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error