移动平台的视听自我监督地形类型发现

论文标题

移动平台的视听自我监督地形类型发现

Audio-Visual Self-Supervised Terrain Type Discovery for Mobile Platforms

论文作者

Kurobe, Akiyoshi, Nakajima, Yoshikatsu, Saito, Hideo, Kitani, Kris

论文摘要

识别和发现地形特征的能力是许多自主地面机器人（例如社交机器人，辅助机器人，自动驾驶汽车和地面勘探机器人）所需的重要功能。认识和发现地形特征是具有挑战性的，因为相似的地形可能具有截然不同的外观（例如，地毯有多种颜色），而外观非常相似的地形可能具有非常不同的物理特性（例如覆盖物与污垢）。为了解决基于视觉的地形识别和发现中固有的歧义，我们提出了一种多模式的自我监管学习技术，该技术在从连接到移动平台的底面的麦克风中提取的音频功能和图像功能在平台上提取的图像特征从平台上提取的图像特征，以聚集地形。然后，将地形簇标签用于训练基于图像的卷积神经网络，以预测地形类型的变化。通过实验，我们证明了提出的自我监督的地形类型发现方法达到了80％以上的精度，这极大地胜过了几个基线，并提出了辅助应用的强大潜力。

The ability to both recognize and discover terrain characteristics is an important function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots. Recognizing and discovering terrain characteristics is challenging because similar terrains may have very different appearances (e.g., carpet comes in many colors), while terrains with very similar appearance may have very different physical properties (e.g. mulch versus dirt). In order to address the inherent ambiguity in vision-based terrain recognition and discovery, we propose a multi-modal self-supervised learning technique that switches between audio features extracted from a mic attached to the underside of a mobile platform and image features extracted by a camera on the platform to cluster terrain types. The terrain cluster labels are then used to train an image-based convolutional neural network to predict changes in terrain types. Through experiments, we demonstrate that the proposed self-supervised terrain type discovery method achieves over 80% accuracy, which greatly outperforms several baselines and suggests strong potential for assistive applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题