论文标题
HyperLogloGlog:一个日志的基数估算更多
HyperLogLogLog: Cardinality Estimation With One Log More
论文作者
论文摘要
我们提出了超量列loglog,这是对超置loglog草图的实用压缩,该草图从$ o(m \ log \ log \ log n)$ lits压缩到$ m \ log_2 \ log_2 \ log_2 \ log_2 \ log_2 m + o(m + \ log \ log \ log \ log \ log \ log \ log \ log \ log \ n)$ lits $ bit,用于估计使用不同elements undermement elements〜$ n $ m $ m $ regiesters的数量。该算法用作置换式替代品,可保留超置logg草图的所有估计属性,可以在压缩和未压缩表示之间来回转换,并且压缩草图在压缩域中保持了合并性。假设$ n $足够大于$ m $,则可以在摊销的恒定时间中更新压缩草图。我们提供了草图的C ++实施,并通过实验评估对Google和Apache的实现进行了实验评估,我们的实现提供了小型草图,同时保持了竞争性更新和合并时间。具体而言,我们观察到草图尺寸大约减少了40%。此外,我们获得的理论算法获得了将草图压缩到$ m \ log_2 \ log_2 \ log_2 \ log_2 \ log_2 m+o(m \ log \ log \ log \ log \ log m/\ log log \ log log m+log m+\ log \ log \ log \ log n)$ bits $ bits $ bits。
We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from $O(m\log\log n)$ bits down to $m \log_2\log_2\log_2 m + O(m+\log\log n)$ bits for estimating the number of distinct elements~$n$ using $m$~registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming $n$ is sufficiently larger than $m$. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to $m\log_2\log_2\log_2\log_2 m+O(m\log\log\log m/\log\log m+\log\log n)$ bits.