当前位置: X-MOL 学术IEEE Trans. Inform. Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Minimum Description Length Principle in Supervised Learning with Application to Lasso
IEEE Transactions on Information Theory ( IF 2.5 ) Pub Date : 2020-07-01 , DOI: 10.1109/tit.2020.2998577
Masanori Kawakita , Jun'ichi Takeuchi

The minimum description length (MDL) principle is extended to supervised learning. The MDL principle is a philosophy that the shortest description of given data leads to the best hypothesis about the data source. One of the key theories for the MDL principle is Barron and Cover’s theory (BC theory), which mathematically justifies the MDL principle based on two-stage codes in density estimation (unsupervised learning). Though the codelength of two-stage codes looks similar to the target function of penalized likelihood methods, parameter optimization of penalized likelihood methods is done without quantization of parameter space. Recently, Chatterjee and Barron have provided theoretical tools to extend BC theory to penalized likelihood methods by overcoming this difference. Indeed, applying their tools, they showed that the famous penalized likelihood method ‘lasso’ can be interpreted as an MDL estimator and enjoys performance guarantee by BC theory. An important fact is that their results assume a fixed design setting, which is essentially the same as unsupervised learning. The fixed design is natural if we use lasso for compressed sensing. If we use lasso for supervised learning, however, the fixed design is considerably unsatisfactory. Only random design is acceptable. However, it is inherently difficult to extend BC theory to the random design regardless of whether the parameter space is quantized or not. In this paper, a novel theoretical tool for extending BC theory to supervised learning (the random design setting and no quantization of parameter space) is provided. Applying this tool, when the covariates are subject to a Gaussian distribution, it is proved that lasso in the random design setting can also be interpreted as an MDL estimator, and that lasso enjoys the risk bound of BC theory. The risk/regret bounds obtained have several advantages inherited from BC theory. First, the bounds require remarkably few assumptions. Second, the bounds hold for any finite sample size $n$ and any finite feature number $p$ even if $n\ll p$ . Behavior of the regret bound is investigated by numerical simulations. We believe that this is the first extensions of BC theory to supervised learning (random design).

中文翻译:

监督学习中的最小描述长度原则在套索中的应用

最小描述长度 (MDL) 原则扩展到监督学习。MDL 原则是一种哲学,即对给定数据的最短描述导致关于数据源的最佳假设。MDL 原理的关键理论之一是 Barron 和 Cover 的理论(BC 理论),它在数学上证明了基于密度估计(无监督学习)中的两阶段代码的 MDL 原理。尽管两阶段码的码长看起来与惩罚似然方法的目标函数相似,但惩罚似然方法的参数优化是在没有量化参数空间的情况下完成的。最近,Chatterjee 和 Barron 提供了理论工具,通过克服这种差异将 BC 理论扩展到惩罚似然方法。事实上,运用他们的工具,他们表明,著名的惩罚似然方法“套索”可以解释为 MDL 估计器,并享有 BC 理论的性能保证。一个重要的事实是,他们的结果假设了一个固定的设计设置,这与无监督学习本质上是一样的。如果我们使用套索进行压缩感知,固定设计是很自然的。然而,如果我们使用 lasso 进行监督学习,固定设计就相当不令人满意了。只有随机设计是可以接受的。然而,无论参数空间是否被量化,将 BC 理论扩展到随机设计本身就很困难。在本文中,提供了一种将 BC 理论扩展到监督学习(随机设计设置和无参数空间量化)的新理论工具。应用这个工具,当协变量服从高斯分布时,证明了随机设计设置中的 lasso 也可以解释为 MDL 估计量,并且 lasso 享有 BC 理论的风险界限。获得的风险/遗憾界限有几个继承自 BC 理论的优点。首先,边界需要非常少的假设。其次,边界适用于任何有限样本大小 $n$ 和任何有限特征数 $p$,即使 $n\ll p$ 也是如此。通过数值模拟研究后悔边界的行为。我们相信这是 BC 理论对监督学习(随机设计)的第一个扩展。首先,边界需要非常少的假设。其次,边界适用于任何有限样本大小 $n$ 和任何有限特征数 $p$,即使 $n\ll p$ 也是如此。通过数值模拟研究后悔边界的行为。我们相信这是 BC 理论对监督学习(随机设计)的第一个扩展。首先,边界需要非常少的假设。其次,边界适用于任何有限样本大小 $n$ 和任何有限特征数 $p$,即使 $n\ll p$ 也是如此。通过数值模拟研究后悔边界的行为。我们相信这是 BC 理论对监督学习(随机设计)的第一个扩展。
更新日期:2020-07-01
down
wechat
bug