Insights from machine learning for evaluating production function estimators on manufacturing survey data

Preciado Arreola, José Luis; Yagi, Daisuke; Johnson, Andrew L.

doi:10.1007/s11123-019-00570-9

Insights from machine learning for evaluating production function estimators on manufacturing survey data

Published: 18 December 2019

Volume 53, pages 181–225, (2020)
Cite this article

Journal of Productivity Analysis Aims and scope Submit manuscript

José Luis Preciado Arreola^1,2,
Daisuke Yagi¹ &
Andrew L. Johnson ORCID: orcid.org/0000-0003-3848-6977^1,3

467 Accesses
1 Citation
Explore all metrics

Abstract

National statistical organizations often rely on non-exhaustive surveys to estimate industry-level production functions in years in which a full census is not conducted. When analyzing data from non-census years, we propose selecting an estimator based on a weighting of its in-sample and predictive performance. We compare Cobb–Douglas functional assumption to existing nonparametric shape constrained estimators and a newly proposed estimator. For simulated data, we find that our proposed estimator has the lowest weighted errors. Using the 2010 Chilean Annual National Industrial Survey, a Cobb–Douglas specification describes at least 90% as much variance as the best alternative estimators in practically all cases considered providing two insights: the benefits of using application data for selecting an estimator, and the benefits of structure in noisy data. Finally for the five largest manufacturing industries, we find that a 30% sample, on average, achieves 60% of the R-squared value that would have been achieved with a full census; however, the variance across industries and samples is large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric Statistical Analysis of Production

Semiparametric smooth coefficient quantile estimation of the production profile

Article 29 March 2016

Fifty Years of Business Confidence Surveys on Manufacturing Sector

Notes

We will primarily focus on establishment censuses and surveys performed by census bureaus where an establishment is defined as a single physical location where business is conducted or where services or industrial operations are performed. An example is the U.S. annual survey of manufacturers. This survey is conducted annually, except for years ending in 2 and 7 in which a census is performed, Foster et al. (2008). This survey includes approximately 50,000 establishments selected from the census universe of 346,000, or approximately a 15% sampling, Fort et al. (2013).
The Chilean National Institute of Statistics uses survey and census. Whereas, in other parts of economics the terms sample and population are commonly used for the same concept.
A body of economic literature of a less computational and more aggregate in nature is the growth accounting literature, Solow (1957) and Barro et al. (2003). These methods rely on price information, a cost minimization assumption and parametric forms to deterministically compute the coefficients of a first order approximation of a production function using observed input cost shares (see for example Syverson 2011). This literature foregoes any model adequacy check of the production model.
Because of our focus on a conditional-mean estimator, we do not model inefficiency and therefore do not report inefficiency levels.
For example Cobb–Douglas is widely used in Monte Carlo simulations, Andor and Hesse (2014); however, the product economics literature does not provide any support for this standard assumption. The Trans-log functional function is more flexible than the Cobb–Douglas functional form; however, the function parameters and domain have to be selected carefully to assure the Trans-log function satisfies concavity. For this reasons, we sacrifice flexibility and select the Cobb–Douglas functional form so that estimates consistent with concavity are more likely to result.
A partitioned region is referred to as a basis region, Hannah and Dunson (2013).
We use lowercase letters as indices and capital letters to indicate the number of components.
The tolerance is set to 1% and CNLS is used as the large model in all of our examples. Initially, we do not use the Generalized Cross Validation (GCV) score approximation used by Hannah and Dunson (2013), because they assert that GCV’s predictive results are only comparable with full cross validation strategies for problems with $n\ge 5000$, which are larger than the datasets we consider in this paper.
The advantage of the machine learning based comparison methods over standard Monte Carlo simulation methods is that we do not need to specify a functional form to generate data. However, we are only able to make statements about the estimator for particular application dataset under consideration. Monte Carlo simulation methods might still be preferred to explore a larger set of potential data generation processes and their associated production functions.
In-sample error is the deviations between the estimated function and the true function measured on the input vectors that appear in the observed sample.
Predictive error is the deviations between the estimated function and the true function measured on input vectors that are unobserved in the sample.
The distinction between comparing the true function and observed output is through Monte Carlo simulations we know the true function and can compare our estimators to the true function. However, when using application data, we can only compare to the observed output in the sample.
Note that the estimator “hat” character is over $E(Er{r}_{ISf}^{nL})$ rather than $Er{r}_{ISf}^{nL}$, the in-sample error for the particular learning set fitted with the production function.
The concept described here is referred to as sample-specificity or the estimator’s performance in a specific sample. If we use the same set of data (the training set) for estimating the model and for evaluating the model there are significant benefits to overfitting. Even if we use the same DGP to generate more observations for the input vectors used in fitting the model, we would expect a worse fit because this new output vector would have a different draw of random error terms. It is this difference between evaluating fit on the training set and evaluating fit on a new draw of the output vector for the same input vectors. This is why we use the algorithm proposed in Hastie et al. (2009).
Computing the in-sample error provides a more detailed measure of the quality of the production function on a full set than the learning error${\overline{MSE}}_{yIS}^{nL}={\sum }_{v=1}^{V}{\sum }_{i=1}^{{n}_{L}}{({\hat{f}}_{vi}-{Y}_{vi})}^{2}/{n}_{L}v$, because it averages performance across many possible ${\epsilon }_{i}$ residual values for the learning set input vector.
Specifically, if we intend to use the learning set’s residual sum of squares, calculation of an estimator ${\sigma }^{2}$ would require knowledge of the functional estimator’s effective number of parameters. However, effective parameters can be difficult to calculate for both nonparametric and sieve estimators.
As a reviewer pointed out, the parametric Cobb–Douglas in multiplicative form with an additive error term is not only challenging in cross-sectional data, but also in a panel setting due to the incidental parameters issue. Specifically analyzing the asymptotic performance, the variance of the noise term is a function of both $T$ and $N$, and unless both of these values approach infinite, the estimate of the noise variance will not converge to the true value.
We experimented with a variety of starting values including linear and Cobb–Douglas estimates, zero values (the default values in Matlab), and the true values. While the starting values of zero perform quit poorly, all other starting value result is very similar estimates.
Note that the definition of ${R}_{FS}^{2}$ implies that if the evaluated estimator fails to explain more variability than the simply taking the mean of the output variable over the full sample, we will instead use the mean as our estimator.
We reemphasize that unlike cross validation procedures in which the goal is to estimate $E(Er{r}_{y}^{n})$, our goal is to estimate $E(Er{r}_{y}^{nL})$.
Again, expectations and averages over the error and optimism metrics discussed are done over all possible learning sets of a given size.
The maximum attainable, i.e., using the full set as the learning set, noise-to-total variance levels of our real datasets are very similar to those of our low noise settings. Compare $(1-{\overline{MSE}}_{FSy}^{nL})/var({Y}_{FS})$ in our low noise settings against the ${R}_{FS}^{2}$ results of the $100$% survey real datasets.
Recall we use the Cobb–Douglas function with an additive error term is used to maintain consistency of the error structure across estimators.
While nonparametric estimators avoid potential functional form misspecification, nonparametric estimates of functional derivatives tend to have higher variances than parametric estimators, Hansen (2018).
Within the class of monotonically increasing and concave production function, the additional assumption of homotheticity can be imposed. A homothetic production function $f({\boldsymbol{X}})$ can be written as $g(h({\boldsymbol{X}}))$ where $g$ is a monotonically increasing function and $h$ is homogeneous of degree 1. Practically what nonhomothetic implies is that while input isoquants are convex for any output level, the shape of the input isoquant can vary across output levels.
We calculate the MRTS as the marginal product of capital as a ratio to the marginal product of labor. Thus, we impose a positive sign convention.
As mentioned, increased testing set variance as the learning set size increases does not seem to be large enough to affect the variance of our estimates across the different learning and testing set sizes considered.
In Appendix 6.6, we further explore the sensitivity of the results shown in Fig. 3 to our assumption of a finite population of firms and discuss the consequences of considering an infinite amount of unobserved firms when assessing the predictive capabilities of our estimators, thus only evaluating estimator performance in terms of predictive error.
These are hyperplanes which have zero coefficients on some input dimensions, implying it is possible to obtain output without the zero-coefficient inputs.
Due to the increased computational burden of using larger datasets, we present results for a single replicate of the DGP for each sample size and only include learning set results. For this section we report RMSE results rather than MSE results because the latter are small and the differences are indistinguishable across settings.

References

Afriat SN (1967) The construction of utility functions from expenditure data. Int Econ Rev 8(1):67–77
Article Google Scholar
Afriat SN (1972) Efficiency estimation of production functions. Int Econ Rev 13(3):568–598
Article Google Scholar
Aigner D, Lovell CK, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Economet 6(1):21–37
Article Google Scholar
Allen DM (1974) The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16(1):125–127
Article Google Scholar
Andor M, Hesse F (2014) The stoned age: the departure into a new era of efficiency analysis? A Monte Carlo comparison of stoned and the “oldies” (sfa and dea). J Product Anal 41(1):85–109
Article Google Scholar
Andor M, Parmeter C (2017) Pseudolikelihood estimation of the stochastic frontier model. Appl Econ 49(55):5651–5661
Article Google Scholar
Balázs G, György A, Szepesvári C (2015) Near-optimal max-affine estimators for convex regression. Proceedings of the eighteenth international conference on artificial intelligence and statistics, PMLR 38:56–64
Barro R, Sala-i-Martin X (2003) Economic growth. The MIT Press, Cambridge, MA
Breiman L (1992) The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc 87(419):738–754
Article Google Scholar
Breiman L (1993) Hinging hyperplanes for regression, classification, and function approximation. IEEE Trans Inf Theory 39(3):999–1013
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Google Scholar
Burman P (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3):503–514
Article Google Scholar
Chambers RG et al. (1988) Applied production analysis: a dual approach. Cambridge University Press. Cambridge, UK
Chen X (2007) Large sample sieve estimation of semi-nonparametric models. Handb Economet 6:5549–5632
Article Google Scholar
De Loecker J, Goldberg PK, Khandelwal AK, Pavcnik N (2016) Prices, markups, and trade reform. Econometrica 84(2):445–510
Article Google Scholar
Efron B (2004) The estimation of prediction error: covariance penalties and cross-validation. J Am Stat Assoc 99(467):619–632
Article Google Scholar
Fare R, Rolf F, Grosskopf S, Lovell CK et al. (1994) Production frontiers. Cambridge University Press. Cambridge, UK
Fort TC, Haltiwanger J, Jarmin RS, Miranda J (2013) How firms respond to business cycles: the role of firm age and firm size. IMF Econ Rev 61(3):520–559
Article Google Scholar
Foster L, Haltiwanger J, Syverson C (2008) Reallocation, firm turnover, and efficiency: selection on productivity or profitability? Am Econ Rev 98(1):394–425
Article Google Scholar
Frisch R (1964) Theory of production. Springer Science & Business Media. Berlin, Germany
Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70(350):320–328
Article Google Scholar
Hannah L, Dunson D (2012) Ensemble methods for convex regression with applications to geometric programming based circuit design. ICML'12 Proceedings of the 29th International Conference on International Conference on Machine Learning, 147–154
Hannah L, Dunson D (2013) Multivariate convex regression with adaptive partitioning. J Mach Learn Res 14(1):3261–3294
Google Scholar
Hansen BE (2018) Econometrics. University of wisconsin, department of economics. Madison, WI
Hastie T, Tibshiraniand R, Friedman J (2009) The elements of statistical learning. Springer series in statistics. Springer, New York Inc, New York
Huang R, Szepesvári C (2014) A finite-sample generalization bound for semiparametric regression: partially linear models. In Proceedings of the artificial intelligence and statistics, pp 402–410
Hwangbo H, Johnson AL Ding Y (2015) Power curve estimation: functional estimation imposing the regular ultra passum law. Working Paper, SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2621033
Kuosmanen T (2008) Representation theorem for convex nonparametric least squares. Economet J 11(2):308–325
Article Google Scholar
Kuosmanen T, Saastamoinen A, Sipiläinen T (2013) What is the best practice for benchmark regulation of electricity distribution? comparison of dea, sfa and stoned methods. Energy Policy 61:740–750
Article Google Scholar
Kuosmanen T, Kortelainen M (2012) Stochastic non-smooth envelopment of data: semi-parametric frontier estimation subject to shape constraints. J Product Anal 38(1):11–28
Article Google Scholar
Lee C-Y, Johnson AL, Moreno-Centeno E, Kuosmanen T (2013) A more efficient algorithm for convex nonparametric least squares. Eur J Oper Res 227(2):391–400
Article Google Scholar
Mazumder R, Choudhury A, Iyengar G, Sen B (2019) A computational framework for multivariate convex regression and its variants. J Am Stat Assoc 114:318–331
Olesen OB, Ruggiero J (2014) Maintaining the regular ultra passum law in data envelopment analysis. Eur J Oper Res 235(3):798–809
Article Google Scholar
Olesen OB, Ruggiero J (2018) An improved afriat-diewert-parkan nonparametric production function estimator. Eur J Oper Res 264(3):1172–1188
Article Google Scholar
Shephard R (1970). Theory of cost and production functions princeton university press. Princeton, New Jersey
Simar L, Zelenyuk V (2011) Stochastic fdh/dea estimators for frontier analysis. J Product Anal 36(1):1–20
Article Google Scholar
Simar L, Van Keilegom I, Zelenyuk V (2017) Nonparametric least squares methods for stochastic frontier models. J Product Anal 47(3):189–204
Article Google Scholar
Solow RM (1957) Technical change and the aggregate production function. Rev Econ Stat 39(3):312–320
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 36(2):111–147
Google Scholar
Syverson C (2011) What determines productivity? J Econ Lit 49(2):326–365
Article Google Scholar
White H, Wooldridge J (1991) Some results on sieve estimation with dependent observations. In: Nonparametric and semiparametric methods in econometrics and statistics: proceedings of the fifth international symposium in economic theory and econometrics, Chapter 18, Cambridge University Press, pp 459–493
Yagi D, Chen Y, Johnson AL, Kuosmanen T (2018) Shape constrained kernel-weighted least squares: Application to production function estimation for chilean manufacturing industries. J Bus Econ Stat, published online: https://doi.org/10.1080/07350015.2018.1431128
Yagi D, Chen Y, Johnson AL, Morita H (2019) An axiomatic nonparametric production function estimator: Modeling production in Japan’s cardboard industry. arXiv, Preprint at: https://arxiv.org/abs/1906.08359

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, USA
José Luis Preciado Arreola, Daisuke Yagi & Andrew L. Johnson
Department of Industrial and Systems Engineering, Tecnológico de Monterrey, Monterrey, Mexico
José Luis Preciado Arreola
Department of Information and Physical Sciences, Osaka University, Osaka, Japan
Andrew L. Johnson

Authors

José Luis Preciado Arreola
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Yagi
View author publications
You can also search for this author in PubMed Google Scholar
Andrew L. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew L. Johnson.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Insights from machine learning for evaluating production function estimators on manufacturing survey data

This appendix includes:

Results for the case with 3 and 4 regressors; More extensive simulated dataset results (Appendix 6.1),
CAPNLS: Formulation and scalability to larger datasets (Appendix 6.2).
Parametric Bootstrap algorithm to calculate expected optimism (Appendix 6.3).
Comparison between different error structures for the parametric Cobb–Douglas function (Appendix 6.4).
Cobb–Douglas results with multiplicative residual assumption for Chilean manufacturing data (Appendix 6.5).
Application Results for Infinite Populations (Appendix 6.6).
Comprehensive Application Results: MPSS, MP, and MRTS (Appendix 6.7).

1.1 More extensive simulated dataset results

In this section, we provide additional Monte Carlo simulation results for the bivariate case and present results for the trivariate and tetravariate cases. The performance of all estimators deteriorates significantly due to the curse of dimensionality. The results reported are for the performance metric full set error, $\widehat{E(Er{r}_{FS}^{nL})}$, which is a weighting of the expected in-sample error and the expected predictive error. While the parametric estimator should have the best performance in terms of expected predictive error, it is not the case for expected in-sample error. The parametric estimator is generally not able to fit the observed data with noise as well as the nonparametric estimators because of parametric estimator’s lack of flexibility. Or stated differently, the nonparametric estimators benefit from being able to overfit the observed sample. We see the effects of lack of flexibility for the parametric estimator relative to the nonparametric estimator is pronounced in higher dimensional low noise cases.

1.1.1 Bi-variate input Cobb–Douglas DGP

In this section we reproduce Table 1 of the text, but also include computational times. We run our experiments on a computer with Intel Core2 Quad CPU 3.00 GHz and 8GB RAM using MATLAB R2016b and the solvers quadprog and lsqnonlin.

Table 10 Number of hyperplanes and runtimes for bivariate input Cobb–Douglas DGP

Insights from machine learning for evaluating production function estimators on manufacturing survey data

Abstract

Access this article

Similar content being viewed by others

Nonparametric Statistical Analysis of Production

Semiparametric smooth coefficient quantile estimation of the production profile

Fifty Years of Business Confidence Surveys on Manufacturing Sector

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Insights from machine learning for evaluating production function estimators on manufacturing survey data

Appendix: Insights from machine learning for evaluating production function estimators on manufacturing survey data

1.1 More extensive simulated dataset results

1.1.1 Bi-variate input Cobb–Douglas DGP

1.1.2 Tri-variate input Cobb–Douglas DGP

1.1.3 Tetravariate input Cobb–Douglas DGP

1.1.4 More extensive simulated dataset results

1.2 CAPNLS: formulation and scalability to larger datasets

1.2.1 CAPNLS as a series of quadratic programs

1.2.2 Scalability of CAPNLS to larger datasets

1.3 Parametric Bootstrap algorithm to calculate expected optimism

1.4 Comparison between different error structures for the parametric Cobb–Douglas function

1.5 Further Cobb–Douglas results with multiplicative residual assumption for Chilean manufacturing data

1.5.1 Cobb–Douglas results with multiplicative residual

1.6 Application results for infinite populations

1.7 Comprehensive application results: MPSS, MP, and MRTS

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Codes

Search

Navigation