渔网：网格中语义热图的未来推断

论文标题

渔网：网格中语义热图的未来推断

FISHING Net: Future Inference of Semantic Heatmaps In Grids

论文作者

Hendy, Noureldin, Sloan, Cooper, Tian, Feng, Duan, Pengfei, Charchut, Nick, Xie, Yuesong, Wang, Chuang, Philbin, James

论文摘要

为了使自主机器人浏览复杂的环境，在几何和语义上了解周围的场景至关重要。现代自主机器人采用多组传感器，包括激光雷达，雷达和摄像头。管理传感器的不同参考框架和特征，并将其观察结果合并为单个表示，使感知变得复杂。为所有传感器选择单个统一表示，简化了感知和融合的任务。在这项工作中，我们提出了一条端到端管道，该管道使用自上而下的表示可以执行语义细分和短期预测。我们的方法由一个神经网络的集合组成，这些集合从不同的传感器方式中吸收传感器数据，并将其转换为单个常见的自上而下的语义网格表示。我们发现这种表示有利，因为它不可知传感器特定的参考帧，并捕获了周围场景的语义和几何信息。由于模式共享单个输出表示形式，因此可以轻松地汇总它们以产生融合的输出。在这项工作中，我们预测短期语义网格，但框架可以扩展到其他任务。这种方法为多模式感知和预测提供了一种简单，可扩展的端到端方法。

For autonomous robots to navigate a complex environment, it is crucial to understand the surrounding scene both geometrically and semantically. Modern autonomous robots employ multiple sets of sensors, including lidars, radars, and cameras. Managing the different reference frames and characteristics of the sensors, and merging their observations into a single representation complicates perception. Choosing a single unified representation for all sensors simplifies the task of perception and fusion. In this work, we present an end-to-end pipeline that performs semantic segmentation and short term prediction using a top-down representation. Our approach consists of an ensemble of neural networks which take in sensor data from different sensor modalities and transform them into a single common top-down semantic grid representation. We find this representation favorable as it is agnostic to sensor-specific reference frames and captures both the semantic and geometric information for the surrounding scene. Because the modalities share a single output representation, they can be easily aggregated to produce a fused output. In this work we predict short-term semantic grids but the framework can be extended to other tasks. This approach offers a simple, extensible, end-to-end approach for multi-modal perception and prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题