论文标题
最佳的圆形和样品尺寸的复杂性,用于平行分类
Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting
论文作者
论文摘要
用于分布式内存架构的最新并行排序算法基于通过采样和直方图计算平衡分区。通过找到将分类键分为均匀尺寸的块的样品,这些算法可以最大程度地减少所需的通信次数。直方图(样品的计算位置)指导采样,从而减少收集的样品的总数。我们在计算平衡分区所需的采样/直方图回合的数量上得出了下限和上限。我们改进了先前的结果,以证明使用$ p $处理器时,$ O(\ log^* p)$ rounds用$ o(p/\ log^* p)$样品每回合就足够。我们将其与一个下限相匹配,该级别表明每回合具有$ O(p)$样本的任何算法都需要至少$ω(\ log^* p)$ rounds。此外,我们证明了一轮的$ω(p \ log p)$样品,因此证明了现有的一轮算法:样本排序,AMS排序和HSS具有最佳的样本量复杂性。为了获得下限,我们提出了一个硬随机输入分布,并从运行的分布理论中应用经典结果。
State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into evenly-sized chunks, these algorithms minimize the number of communication rounds required. Histogramming (computing positions of samples) guides sampling, enabling a decrease in the overall number of samples collected. We derive lower and upper bounds on the number of sampling/histogramming rounds required to compute a balanced partitioning. We improve on prior results to demonstrate that when using $p$ processors, $O(\log^* p)$ rounds with $O(p/\log^* p)$ samples per round suffice. We match that with a lower bound that shows that any algorithm with $O(p)$ samples per round requires at least $Ω(\log^* p)$ rounds. Additionally, we prove the $Ω(p \log p)$ samples lower bound for one round, thus proving that existing one round algorithms: sample sort, AMS sort and HSS have optimal sample size complexity. To derive the lower bound, we propose a hard randomized input distribution and apply classical results from the distribution theory of runs.