在线刘易斯重量抽样

论文标题

在线刘易斯重量抽样

Online Lewis Weight Sampling

论文作者

Woodruff, David P., Yasuda, Taisuke

论文摘要

Cohen和Peng的开创性工作将Lewis的重量抽样引入了理论计算机科学界，得出了快速采样算法的近似值$ d $二维子空间，$ \ ell_p $ to $ \ ell_p $最高$（1+ε）$错误。几项工作将这一重要原始性扩展到其他设置，包括在线核心和滑动窗口模型。但是，这些结果仅适用于\ {1,2 \} $中的$ p \，$ p = 1 $的结果需要次优$ \ tilde o（d^2/ε^2）$样本。在这项工作中，我们设计了第一个几乎最佳的$ \ ell_p $ subspace嵌入在（0，\ infty）$中的所有$ p \ in Online Coreset和滑动窗口模型中。在这两种模型中，我们的算法存储$ \ tilde o（d^{1 \ lor（p/2）}/ε^2）$行。这回答了[bdmmuwz2020]的主要开放问题的实质性概括，并给出了所有$ p \ notin \ {1,2 \} $的第一个结果。 Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/ε^2)$ for $p>2$. Previously, this scheme was only known to have sample complexity $\tilde O(d^{p/2}/ε^5)$, whereas $ \ tilde o（d^{p/2}/ε^2）$如果使用更复杂的递归采样，则无法在线实施递归采样。作为一个应用程序，我们以$（1+ε）$近似重要的通用线性模型（例如logistic回归和$ p $ p $ - - 验证回归的回归）获得了第一个单通路流核算法。我们的上限是由[MSSW2018]引入的复杂性参数$μ$的参数化，我们显示了第一个下限，表明对$μ$的线性依赖性是必要的。

The seminal work of Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community, yielding fast row sampling algorithms for approximating $d$-dimensional subspaces of $\ell_p$ up to $(1+ε)$ error. Several works have extended this important primitive to other settings, including the online coreset and sliding window models. However, these results are only for $p\in\{1,2\}$, and results for $p=1$ require a suboptimal $\tilde O(d^2/ε^2)$ samples. In this work, we design the first nearly optimal $\ell_p$ subspace embeddings for all $p\in(0,\infty)$ in the online coreset and sliding window models. In both models, our algorithms store $\tilde O(d^{1\lor(p/2)}/ε^2)$ rows. This answers a substantial generalization of the main open question of [BDMMUWZ2020], and gives the first results for all $p\notin\{1,2\}$. Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/ε^2)$ for $p>2$. Previously, this scheme was only known to have sample complexity $\tilde O(d^{p/2}/ε^5)$, whereas $\tilde O(d^{p/2}/ε^2)$ is known if a more sophisticated recursive sampling is used. The recursive sampling cannot be implemented online, thus necessitating an analysis of one-shot Lewis weight sampling. Our analysis uses a novel connection to online numerical linear algebra. As an application, we obtain the first one-pass streaming coreset algorithms for $(1+ε)$ approximation of important generalized linear models, such as logistic regression and $p$-probit regression. Our upper bounds are parameterized by a complexity parameter $μ$ introduced by [MSSW2018], and we show the first lower bounds showing that a linear dependence on $μ$ is necessary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题