LASSO+DEA for small and big wide data,Omega

当前位置： X-MOL 学术 › Omega › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LASSO+DEA for small and big wide data
Omega ( IF 6.7 ) Pub Date : 2021-01-21 , DOI: 10.1016/j.omega.2021.102419
Ya Chen , Mike Tsionas , Valentin Zelenyuk

In data envelopment analysis (DEA), the curse of dimensionality problem may jeopardize the accuracy or even the relevance of results when there is a relatively large dimension of inputs and outputs, even for relatively large samples. Recently, a machine learning approach based on the least absolute shrinkage and selection operator (LASSO) for variable selection was combined with sign-constrained convex nonparametric least squares (SCNLS, a special case of DEA), and dubbed as LASSO-SCNLS, as a way to circumvent the curse of dimensionality problem. In this paper, we revisit this interesting approach, by considering various data generating processes. We also explore a more advanced version of LASSO, the so-called elastic net (EN) approach, adapt it to DEA and propose the EN-DEA. Our Monte Carlo simulations provide additional and to some extent, new evidence and conclusions. In particular, we find that none of the considered approaches clearly dominate the others. To circumvent the curse of dimensionality of DEA in the context of big wide data, we also propose a simplified two-step approach which we call LASSO+DEA. We find that the proposed simplified approach could be more useful than the existing more sophisticated approaches for reducing very large dimensions into sparser, more parsimonious DEA models that attain greater discriminatory power and suffer less from the curse of dimensionality.

中文翻译：

LASSO + DEA适用于大小数据

在数据包络分析（DEA）中，即使输入和输出的维数较大，即使对于较大的样本，维数问题的诅咒也可能会损害结果的准确性甚至是结果的相关性。最近，将基于最小绝对收缩和选择算子（LASSO）进行变量选择的机器学习方法与符号约束的凸非参数最小二乘（SCNLS，DEA的特殊情况）组合在一起，称为LASSO-SCNLS，作为一种规避维度问题诅咒的方法。在本文中，我们通过考虑各种数据生成过程来重新研究这种有趣的方法。我们还探索了LASSO的更高级版本，即所谓的弹性网（EN）方法，使其适用于DEA，并提出了EN-DEA。我们的蒙特卡洛模拟在一定程度上提供了额外的信息，新的证据和结论。特别是，我们发现，所有考虑的方法均未明确主导其他方法。为了规避大数据范围内DEA维度的诅咒，我们还提出了一种简化的两步方法，称为LASSO + DEA。我们发现，所提出的简化方法可能比现有的更复杂的方法更有用，可以将非常大的尺寸缩减为稀疏，更简约的DEA模型，从而获得更大的辨别力，并且更少遭受维数的诅咒。

更新日期：2021-01-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文