论文标题
关于均匀性测试的记忆复杂性
On The Memory Complexity of Uniformity Testing
论文作者
论文摘要
在本文中,我们考虑记忆力有限的均匀性测试问题。我们观察到一系列独立分布的随机变量,该变量是从$ [n] $上方的分布$ p $绘制的,该变量是统一或$ \ varepsilon $ -far在总变化距离下从统一的,我们的目标是确定正确的假设。在每个时间点,我们都可以使用$ s $状态更新有限内存机器的状态,其中每个机器的每个状态都被分配了一个假设,我们有兴趣在两个假设下均匀地获得$ 0 <Δ<1/2 $的渐近概率。 本文的主要贡献是在为实现$ n $和$ \ varepsilon $的函数的恒定错误概率$δ$所需的状态$ s $数量上得出上限和下限$ω(n+\ frac {1} {\ varepsilon})$。该领域的先前作品几乎完全使用了上限的碰撞计数,并且用于下限的Paninski混合物。令人惊讶的是,在有限的内存中,无限样本设置,最佳解决方案不涉及计数碰撞,而Paninski先验并不难。因此,为了达到我们的界限,需要不同的证明技术。
In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\varepsilon$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<δ<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $δ$, as a function of $n$ and $\varepsilon$, where our upper bound is $O(\frac{n\log n}{\varepsilon})$ and our lower bound is $Ω(n+\frac{1}{\varepsilon})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard. Thus, different proof techniques are needed in order to attain our bounds.