当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Symmetric Norm Estimation and Regression on Sliding Windows
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-09-03 , DOI: arxiv-2109.01635
Vladimir Braverman, Viska Wei, Samson Zhou

The sliding window model generalizes the standard streaming model and often performs better in applications where recent data is more important or more accurate than data that arrived prior to a certain time. We study the problem of approximating symmetric norms (a norm on $\mathbb{R}^n$ that is invariant under sign-flips and coordinate-wise permutations) in the sliding window model, where only the $W$ most recent updates define the underlying frequency vector. Whereas standard norm estimation algorithms for sliding windows rely on the smooth histogram framework of Braverman and Ostrovsky (FOCS 2007), analyzing the smoothness of general symmetric norms seems to be a challenging obstacle. Instead, we observe that the symmetric norm streaming algorithm of Braverman et. al. (STOC 2017) can be reduced to identifying and approximating the frequency of heavy-hitters in a number of substreams. We introduce a heavy-hitter algorithm that gives a $(1+\epsilon)$-approximation to each of the reported frequencies in the sliding window model, thus obtaining the first algorithm for general symmetric norm estimation in the sliding window model. Our algorithm is a universal sketch that simultaneously approximates all symmetric norms in a parametrizable class and also improves upon the smooth histogram framework for estimating $L_p$ norms, for a range of large $p$. Finally, we consider the problem of overconstrained linear regression problem in the case that loss function that is an Orlicz norm, a symmetric norm that can be interpreted as a scale-invariant version of $M$-estimators. We give the first sublinear space algorithms that produce $(1+\epsilon)$-approximate solutions to the linear regression problem for loss functions that are Orlicz norms in both the streaming and sliding window models.

中文翻译:

滑动窗口上的对称范数估计和回归

滑动窗口模型概括了标准流模型,并且通常在最近数据比特定时间之前到达的数据更重要或更准确的应用程序中表现更好。我们研究了滑动窗口模型中的近似对称范数($\mathbb{R}^n$ 上的范数,在符号翻转和坐标排列下不变)的问题,其中只有 $W$ 最近的更新定义基础频率向量。虽然滑动窗口的标准范数估计算法依赖于 Braverman 和 Ostrovsky (FOCS 2007) 的平滑直方图框架,但分析一般对称范数的平滑性似乎是一个具有挑战性的障碍。相反,我们观察到 Braverman 等人的对称范数流算法。阿尔。(STOC 2017)可以简化为识别和估计多个子流中重击者的频率。我们引入了一种重击算法,该算法对滑动窗口模型中的每个报告频率给出了 $(1+\epsilon)$-近似值,从而获得了滑动窗口模型中通用对称范数估计的第一个算法。我们的算法是一个通用草图,它同时逼近参数化类中的所有对称范数,并且还改进了平滑直方图框架,用于估计大 $p$ 范围内的 $L_p$ 范数。最后,我们考虑在损失函数是 Orlicz 范数的情况下的过度约束线性回归问题,Orlicz 范数是一个对称范数,可以解释为 $M$-estimators 的尺度不变版本。
更新日期:2021-09-06
down
wechat
bug