Farsee-net：通过有效的多尺度上下文聚合和特征空间超分辨率的实时语义分割

论文标题

Farsee-net：通过有效的多尺度上下文聚合和特征空间超分辨率的实时语义分割

FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution

论文作者

Zhang, Zhanpeng, Zhang, Kaipeng

论文摘要

在许多机器人应用中，需要实时的语义细分，具有有限的计算资源。语义细分的一个挑战是处理对象量表变化并利用上下文。如何在有限的计算预算中执行多尺度上下文聚合很重要。在本文中，首先，我们引入了一个新颖而有效的模块，称为级联分解的可分解的无与伦比的空间金字塔池（CF-ASPP）。它是卷积神经网络（CNN）的轻质级联结构，可有效利用上下文信息。另一方面，为了运行时效率，最先进的方法将迅速降低早期网络阶段中输入或特征图的空间大小。最终的高分辨率结果通常是通过非参数上采样操作（例如双线性插值）获得的。以不同的方式，我们重新考虑这条管道并将其视为一个超分辨率过程。我们在上采样步骤中使用优化的超分辨率操作并提高准确性，尤其是在用于实时应用程序的子采样输入图像方案中。通过融合以上两个改进，我们的方法提供了比其他最新方法更好的延迟准确性权衡。特别是，我们在带有单个Nivida Titan X（Maxwell）GPU卡的CityScapes测试套装上以84 fps的价格获得68.4％MIOU。所提出的模块可以插入任何功能提取CNN中，并从CNN结构开发中受益。

Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of semantic segmentation is to deal with the object scale variations and leverage the context. How to perform multi-scale context aggregation within limited computation budget is important. In this paper, firstly, we introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information. On the other hand, for runtime efficiency, state-of-the-art methods will quickly decrease the spatial size of the inputs or feature maps in the early network stages. The final high-resolution result is usually obtained by non-parametric up-sampling operation (e.g. bilinear interpolation). Differently, we rethink this pipeline and treat it as a super-resolution process. We use optimized super-resolution operation in the up-sampling step and improve the accuracy, especially in sub-sampled input image scenario for real-time applications. By fusing the above two improvements, our methods provide better latency-accuracy trade-off than the other state-of-the-art methods. In particular, we achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card. The proposed module can be plugged into any feature extraction CNN and benefits from the CNN structure development.

下载PDF全文

下载文献需遵守相关版权规定

论文标题