当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online Censoring for Large-Scale Regressions with Application to Streaming Big Data
IEEE Transactions on Signal Processing ( IF 5.4 ) Pub Date : 2016-08-01 , DOI: 10.1109/tsp.2016.2546225
Dimitris Berberidis 1 , Vassilis Kekatos 2 , Georgios B Giannakis 1
Affiliation  

On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first- and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.

中文翻译:

大规模回归的在线审查与流式大数据的应用

与数据密集型应用程序相比,现代线性回归问题的庞大规模对高效求解器的需求不断增长。幸运的是,在以负担得起的计算预算保持一定质量的统计推断的同时,可以省略很大一部分产生的数据。这项工作介绍了以在线和数据自适应方式识别和省略信息较少的观察的方法。给定流数据,使用一阶和二阶随机近似算法依次找到相关的最大似然估计量。这些方案非常适用于数据本身被审查或旨在节省分散学习设置中的通信开销的情况。在不同的操作场景中,联合审查和估计的任务被提出来解决集中设置中的大规模线性回归。开发了新的在线算法,享受简单的封闭形式更新和可证明的(非)渐近收敛保证。为了获得所需的审查模式和降维水平,还研究了阈值规则。与基于数据不可知随机投影的替代方案相比,对真实和合成数据集的数值测试证实了所提出的数据自适应方法的有效性。
更新日期:2016-08-01
down
wechat
bug