论文标题

覆盖高维数字和量化

Covering of high-dimensional cubes and quantization

论文作者

Zhigljavsky, Anatoly, Noonan, Jack

论文摘要

作为主要问题,我们考虑覆盖$ d $二维多维数据集,$ n $ balls,$ n $ d $ $ d $(10或更多)且相当小的$ n $,例如$ n = 100 $或$ n = 1000 $。我们不需要全部覆盖范围,而只需要90 \%或95 \%覆盖范围。我们确定有效覆盖方案具有几个重要属性,这些属性在很大的$ n $中没有在小维度和渐近考虑方面看到。这些属性之一可以称为“不要试图覆盖顶点”,因为多维数据集的顶点及其近距离社区很难覆盖,而且对于大$ d $来说,其中很多。我们清楚地表明,与普遍的信念相反,将球放在立方体中构成低分配序列的点上,这使得覆盖方案效率低下。对于一个随机覆盖的家族,我们能够为覆盖范围提供非常准确的近似值。然后,我们将结果扩展到较小的立方体和量化的覆盖范围的问题,后者也称为设施位置。除了理论上的考虑和近似值的推导外,我们还讨论了大规模数值研究的结果。

As the main problem, we consider covering of a $d$-dimensional cube by $n$ balls with reasonably large $d$ (10 or more) and reasonably small $n$, like $n=100$ or $n=1000$. We do not require the full coverage but only 90\% or 95\% coverage. We establish that efficient covering schemes have several important properties which are not seen in small dimensions and in asymptotical considerations, for very large $n$. One of these properties can be termed `do not try to cover the vertices' as the vertices of the cube and their close neighbourhoods are very hard to cover and for large $d$ there are far too many of them. We clearly demonstrate that, contrary to a common belief, placing balls at points which form a low-discrepancy sequence in the cube, makes for a very inefficient covering scheme. For a family of random coverings, we are able to provide very accurate approximations to the coverage probability. We then extend our results to the problems of coverage of a cube by smaller cubes and quantization, the latter being also referred to as facility location. Along with theoretical considerations and derivation of approximations, we discuss results of a large-scale numerical investigation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源