论文标题
Snapshotnet:使用最小标签数据进行点云数据分割的自我监督的功能学习
SnapshotNet: Self-supervised Feature Learning for Point Cloud Data Segmentation Using Minimal Labeled Data
论文作者
论文摘要
手动注释复杂场景点云数据集既昂贵又容易出错。为了减少对标记数据的依赖,提出了一种称为Snapshotnet的新模型作为一种自我监督的特征学习方法,该方法直接在复杂的3D场景的未标记点云数据上起作用。 Snapshotnet管道包括三个阶段。在快照捕获阶段中,从点云场景采样了定义为本地积分集合的快照。快照可以是直接从真实场景捕获的本地3D扫描的视图,也可以是从大型3D Point Cloud数据集中的虚拟视图。快照也可以在不同的采样率或视野(FOV)(从而将多部队快照)中进行采样,以捕获场景中的比例信息。在功能学习阶段,提出了一个新的称为多fov对比的新的预文本任务,以识别两个快照是否来自同一FOV或跨不同FOV的同一对象。快照通过两个自我监督的学习步骤:与零件和比例对比的对比度学习步骤,然后是快照聚类步骤,以提取更高级别的语义特征。然后,通过首先训练标准的SVM分类器在学习的功能上,用一小部分标记的快照来实现一个弱监督的分割阶段。经过训练的SVM用于预测输入快照的标签,并将预测标签转换为点标签分配,以使用投票程序进行语义分割,以进行整个场景的语义分割。实验是在Semantic3D数据集上进行的,结果表明,所提出的方法能够从没有任何标签的复杂场景数据的快照中学习有效的功能。此外,在与弱监督点云语义分割的SOA方法进行比较时,提出的方法已显示出优势。
Manually annotating complex scene point cloud datasets is both costly and error-prone. To reduce the reliance on labeled data, a new model called SnapshotNet is proposed as a self-supervised feature learning approach, which directly works on the unlabeled point cloud data of a complex 3D scene. The SnapshotNet pipeline includes three stages. In the snapshot capturing stage, snapshots, which are defined as local collections of points, are sampled from the point cloud scene. A snapshot could be a view of a local 3D scan directly captured from the real scene, or a virtual view of such from a large 3D point cloud dataset. Snapshots could also be sampled at different sampling rates or fields of view (FOVs), thus multi-FOV snapshots, to capture scale information from the scene. In the feature learning stage, a new pre-text task called multi-FOV contrasting is proposed to recognize whether two snapshots are from the same object or not, within the same FOV or across different FOVs. Snapshots go through two self-supervised learning steps: the contrastive learning step with both part and scale contrasting, followed by a snapshot clustering step to extract higher level semantic features. Then a weakly-supervised segmentation stage is implemented by first training a standard SVM classifier on the learned features with a small fraction of labeled snapshots. The trained SVM is used to predict labels for input snapshots and predicted labels are converted into point-wise label assignments for semantic segmentation of the entire scene using a voting procedure. The experiments are conducted on the Semantic3D dataset and the results have shown that the proposed method is capable of learning effective features from snapshots of complex scene data without any labels. Moreover, the proposed method has shown advantages when comparing to the SOA method on weakly-supervised point cloud semantic segmentation.