论文标题
两级空间内存索引
A Two-level Spatial In-Memory Index
论文作者
论文摘要
大量的空间数据越来越多地获得并需要有效的管理。尽管已经进行了数十年的空间数据管理研究,但很少有作品认为商品硬件的当前状态,具有相对较大的内存和并行多核处理的能力。在这项工作中,我们重新考虑了在这个新现实下的空间索引设计。具体而言,我们为具有空间范围的对象提出了一种主要内存索引方法,该方法基于经典的常规空间分配到分离砖中。我们索引的新颖性是每个瓷砖的内容进一步分为四类。这种二级分区不仅减少了计算结果所需的比较数量,而且还避免了生成和消除重复结果,这是基于分离空间分配的空间索引的固有问题。由我们的索引方案定义的空间分区是完全独立的,促进了轻松的并行评估,因为必须在分区之间进行同步或通信。我们展示了如何使用索引有效地处理空间范围查询并大大降低查询的完善步骤的成本。此外,我们研究了批处理和并行的众多范围查询的有效处理。对实际数据集的广泛实验证实了我们方法的效率。
Very large volumes of spatial data increasingly become available and demand effective management. While there has been decades of research on spatial data management, few works consider the current state of commodity hardware, having relatively large memory and the ability of parallel multi-core processing. In this work, we re-consider the design of spatial indexing under this new reality. Specifically, we propose a main-memory indexing approach for objects with spatial extent, which is based on a classic regular space partitioning into disjoint tiles. The novelty of our index is that the contents of each tile are further partitioned into four classes. This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem of spatial indexes based on disjoint space partitioning. The spatial partitions defined by our indexing scheme are totally independent, facilitating effortless parallel evaluation, as no synchronization or communication between the partitions is necessary. We show how our index can be used to efficiently process spatial range queries and drastically reduce the cost of the refinement step of the queries. In addition, we study the efficient processing of numerous range queries in batch and in parallel. Extensive experiments on real datasets confirm the efficiency of our approaches.