On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets,IEEE Transactions on Information Theory

当前位置： X-MOL 学术 › IEEE Trans. Inform. Theory › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets
IEEE Transactions on Information Theory ( IF 2.2 ) Pub Date : 2020-12-14 , DOI: 10.1109/tit.2020.3044622
Shao-Lun Huang , Xiangxiang Xu

The Hirschfeld-Gebelein–Rényi (HGR) maximal correlation and the corresponding functions have been shown useful in many machine learning scenarios. In this paper, we study the sample complexity of estimating the HGR maximal correlation functions by the alternating conditional expectation (ACE) algorithm using training samples from large datasets. Specifically, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution, and the functions estimated from the ACE algorithm. For both supervised and semi-supervised learning scenarios, we establish the analytical expressions for the error exponents of the learning errors. Furthermore, we demonstrate that for large datasets, the upper bounds for the sample complexity of learning the HGR maximal correlation functions by the ACE algorithm can be expressed using the established error exponents. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semi-supervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.

中文翻译：

大数据集的HGR最大相关函数的样本复杂度

Hirschfeld-Gebelein-Rényi（HGR）最大相关性和相应的功能已被证明在许多机器学习场景中很有用。在本文中，我们研究了使用来自大型数据集的训练样本通过交替条件期望（ACE）算法估计HGR最大相关函数的样本复杂性。具体来说，我们开发了一个数学框架来表征从真实分布计算出的最大相关函数与从ACE算法估计出的函数之间的学习错误。对于有监督和半监督学习场景，我们为学习错误的错误指数建立了解析表达式。此外，我们证明了对于大型数据集，使用建立的误差指数可以表示通过ACE算法学习HGR最大相关函数的样本复杂度的上限。此外，根据我们的理论结果，我们研究了总监督预算约束下的半监督学习中不同类型样本的抽样策略，并开发了一种最佳抽样策略以最大化学习误差的误差指数。最后，提出了数值模拟以支持我们的理论结果。并开发了一种最佳采样策略，以最大化学习误差的误差指数。最后，提出了数值模拟以支持我们的理论结果。并开发了一种最佳采样策略，以最大化学习误差的误差指数。最后，提出了数值模拟以支持我们的理论结果。

更新日期：2021-02-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11