论文标题
搜索有效的3D体系结构,具有稀疏的点-Voxel卷积
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
论文作者
论文摘要
自动驾驶汽车需要有效,准确地了解3D场景才能安全驾驶。鉴于硬件资源有限,由于低分辨率的体丙氧化和侵略性的下采样,现有的3D感知模型无法很好地识别小实例(例如行人,骑自行车的人)。为此,我们提出了一个稀疏点 - 素卷积(SPVCONV),这是一种轻巧的3D模块,将香草稀疏卷积与高分辨率基于点的分支相对。由于忽略不计的开销,该基于点的分支也能够在大型室外场景中保留细节。为了探索高效的3D模型的范围,我们首先定义了基于SPVConv的灵活体系结构设计空间,然后我们介绍了3D神经体系结构搜索(3D-NAS),以搜索有关此多样化设计空间的最佳网络体系结构,并有效地有效地搜索。实验结果验证了所得的SPVNA模型是快速准确的:它的表现优于最先进的Minkowskinet,在竞争性Semantickitti排行榜上排名第一。它还以更高的精度实现了8倍的计算减少和3倍的速度。最后,我们将方法转移到3D对象检测中,并且可以在Kitti上的单阶段检测基线方面进行一致的改进。
Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.