使用拓扑持久的功能，在室内环境中的视觉对象识别

论文标题

使用拓扑持久的功能，在室内环境中的视觉对象识别

Visual Object Recognition in Indoor Environments Using Topologically Persistent Features

论文作者

Samani, Ekta U., Yang, Xingjian, Banerjee, Ashis G.

论文摘要

在看不见的室内环境中的对象识别仍然是对移动机器人视觉感知的一个挑战性问题。在这封信中，我们建议使用拓扑持久的特征，这些功能依赖对象的形状信息来应对这一挑战。特别是，我们通过将持续的同源性应用于代表对象分割图的立方体复合物的基于多向高度功能的过滤，提取两种特征，即稀疏持久图像（PI）和振幅。然后，这些功能用于训练完全连接的网络以识别。为了进行性能评估，除了使用广泛使用的形状数据集和一个基准室内场景数据集外，我们还收集了一个新的数据集，包括来自两个不同环境的场景图像，即客厅和一个模拟仓库。在不同的照明条件下使用不同的相机姿势捕获场景，并在给定的14个对象中包含多达五个不同的对象。在基准室内场景数据集中，稀疏PI功能在看不见的环境中表现出更好的识别性能，而不是使用广泛使用的RESNETV2-56和ExcilityNet-B4模型所学的功能。此外，它们提供的召回率和准确性值比更快的R-CNN（一种端到端对象检测方法及其最新的变体自适应更快的R-CNN）更高。从培训环境（客厅）到新数据集中的看不见的环境（模拟仓库），我们方法的性能也相对不变。相反，对象检测方法的性能大大下降。我们还在现实世界机器人上实施了提出的方法，以证明其有用性。

Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots. In this letter, we propose the use of topologically persistent features, which rely on the objects' shape information, to address this challenge. In particular, we extract two kinds of features, namely, sparse persistence image (PI) and amplitude, by applying persistent homology to multi-directional height function-based filtrations of the cubical complexes representing the object segmentation maps. The features are then used to train a fully connected network for recognition. For performance evaluation, in addition to a widely used shape dataset and a benchmark indoor scenes dataset, we collect a new dataset, comprising scene images from two different environments, namely, a living room and a mock warehouse. The scenes are captured using varying camera poses under different illumination conditions and include up to five different objects from a given set of fourteen objects. On the benchmark indoor scenes dataset, sparse PI features show better recognition performance in unseen environments than the features learned using the widely used ResNetV2-56 and EfficientNet-B4 models. Further, they provide slightly higher recall and accuracy values than Faster R-CNN, an end-to-end object detection method, and its state-of-the-art variant, Domain Adaptive Faster R-CNN. The performance of our methods also remains relatively unchanged from the training environment (living room) to the unseen environment (mock warehouse) in the new dataset. In contrast, the performance of the object detection methods drops substantially. We also implement the proposed method on a real-world robot to demonstrate its usefulness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题