论文标题
在线刘易斯重量抽样
Online Lewis Weight Sampling
论文作者
论文摘要
Cohen和Peng的开创性工作将Lewis的重量抽样引入了理论计算机科学界,得出了快速采样算法的近似值$ d $二维子空间,$ \ ell_p $ to $ \ ell_p $最高$(1+ε)$错误。几项工作将这一重要原始性扩展到其他设置,包括在线核心和滑动窗口模型。但是,这些结果仅适用于\ {1,2 \} $中的$ p \,$ p = 1 $的结果需要次优$ \ tilde o(d^2/ε^2)$样本。 在这项工作中,我们设计了第一个几乎最佳的$ \ ell_p $ subspace嵌入在(0,\ infty)$中的所有$ p \ in Online Coreset和滑动窗口模型中。在这两种模型中,我们的算法存储$ \ tilde o(d^{1 \ lor(p/2)}/ε^2)$行。这回答了[bdmmuwz2020]的主要开放问题的实质性概括,并给出了所有$ p \ notin \ {1,2 \} $的第一个结果。 Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/ε^2)$ for $p>2$. Previously, this scheme was only known to have sample complexity $\tilde O(d^{p/2}/ε^5)$, whereas $ \ tilde o(d^{p/2}/ε^2)$如果使用更复杂的递归采样,则无法在线实施递归采样。 作为一个应用程序,我们以$(1+ε)$近似重要的通用线性模型(例如logistic回归和$ p $ p $ - - 验证回归的回归)获得了第一个单通路流核算法。我们的上限是由[MSSW2018]引入的复杂性参数$μ$的参数化,我们显示了第一个下限,表明对$μ$的线性依赖性是必要的。
The seminal work of Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community, yielding fast row sampling algorithms for approximating $d$-dimensional subspaces of $\ell_p$ up to $(1+ε)$ error. Several works have extended this important primitive to other settings, including the online coreset and sliding window models. However, these results are only for $p\in\{1,2\}$, and results for $p=1$ require a suboptimal $\tilde O(d^2/ε^2)$ samples. In this work, we design the first nearly optimal $\ell_p$ subspace embeddings for all $p\in(0,\infty)$ in the online coreset and sliding window models. In both models, our algorithms store $\tilde O(d^{1\lor(p/2)}/ε^2)$ rows. This answers a substantial generalization of the main open question of [BDMMUWZ2020], and gives the first results for all $p\notin\{1,2\}$. Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/ε^2)$ for $p>2$. Previously, this scheme was only known to have sample complexity $\tilde O(d^{p/2}/ε^5)$, whereas $\tilde O(d^{p/2}/ε^2)$ is known if a more sophisticated recursive sampling is used. The recursive sampling cannot be implemented online, thus necessitating an analysis of one-shot Lewis weight sampling. Our analysis uses a novel connection to online numerical linear algebra. As an application, we obtain the first one-pass streaming coreset algorithms for $(1+ε)$ approximation of important generalized linear models, such as logistic regression and $p$-probit regression. Our upper bounds are parameterized by a complexity parameter $μ$ introduced by [MSSW2018], and we show the first lower bounds showing that a linear dependence on $μ$ is necessary.