自适应距离估计

论文标题

自适应距离估计

On Adaptive Distance Estimation

论文作者

Cherapanamjeri, Yeshwanth, Nelson, Jelani

论文摘要

我们提供了一个静态数据结构，用于距离估计，该估计支持{\ it自适应}查询。具体而言，给定一个数据集$ x = \ {x_i \} _ {i = 1}^n $的$ n $ of $ n $ of $ n $点的$ \ mathbb {r}^d $和$ 0 <p \ leq 2 $，我们构建了一个随机数据结构的随机数据结构，并以较低的记忆消耗和QUERY POINT $ Q \ Q \ q \ quty point} $（1+ε）$ - $ \ lvert q -x_i \ rvert_p $的近似，对于[n] $中的所有$ i \，概率很高。主要新颖性是我们数据结构的正确性保证即使可以自适应地选择查询顺序：允许对手选择$ j $ t的查询点$ q_j $，以取决于$ q_1，\ ldots，q_ {j-1} $的数据结构所报告的答案。以前的随机蒙特卡洛方法在适应性选择的查询设置中没有提供错误保证。我们的记忆消耗为$ \ tilde o（（n + d）d/ε^2）$，明确地将$ x $存储在内存中所需的$ o（nd）$略多，但是有好处，我们的回答问题只有$ \ tilde o（tilde o（ε^{-2}（n + d）$ nata ynaive ynaive ynaive ynaive ynaive ynd ynd， $ n $和$ d $非常大。这里$ \ tilde o $ hides $ \ log（nd/ε）$因素。我们讨论了与最近的邻居搜索和非参数估计的应用程序。 Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to $\ell_p$ norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.

We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data structure with low memory consumption and query time which, when later given any query point $q \in \mathbb{R}^d$, outputs a $(1+ε)$-approximation of $\lVert q - x_i \rVert_p$ with high probability for all $i\in[n]$. The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the $j$th query point $q_j$ in a way that depends on the answers reported by the data structure for $q_1,\ldots,q_{j-1}$. Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is $\tilde O((n+d)d/ε^2)$, slightly more than the $O(nd)$ required to store $X$ in memory explicitly, but with the benefit that our time to answer queries is only $\tilde O(ε^{-2}(n + d))$, much faster than the naive $Θ(nd)$ time obtained from a linear scan in the case of $n$ and $d$ very large. Here $\tilde O$ hides $\log(nd/ε)$ factors. We discuss applications to nearest neighbor search and nonparametric estimation. Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to $\ell_p$ norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题