通过多源混合采样和元学习来稳健地对事故场景进行稳健的语义分割

论文标题

通过多源混合采样和元学习来稳健地对事故场景进行稳健的语义分割

Towards Robust Semantic Segmentation of Accident Scenes via Multi-Source Mixed Sampling and Meta-Learning

论文作者

Luo, Xinyu, Zhang, Jiaming, Yang, Kailun, Roitberg, Alina, Peng, Kunyu, Stiefelhagen, Rainer

论文摘要

自动驾驶汽车利用城市场景细分来像人类一样理解现实世界，并做出相应的反应。正常场景的语义分割在常规基准上的准确性显着提高。但是，现实事故的很大一部分都具有异常场景，例如具有物体变形，倾覆和意外交通行为的情况。由于即使对驾驶场景的微小分割也可能导致对人类生命的严重威胁，因此在事故情景中，此类模型的鲁棒性是确保智能运输系统安全的极为重要的因素。在本文中，我们提出了一个多源元学习无监督的域适应性（MMUDA）框架，以改善分割变压器对极端事故场景的概括。在Mmuda中，我们利用多域混合采样来增强具有目标数据出现（异常场景）的多源域（正常场景）的图像。为了训练我们的模型，我们在多源设置中互动并研究了一种元来制定的元学习策略，以鲁棒化分割结果。我们进一步通过HybridAsppp解码器设计增强了分割骨干（Segformer），具有大型窗户注意空间金字塔池和条带合并，以有效地汇总了长距离上下文依赖性。我们的方法在Dada-Seg基准上获得了46.97％的MIOU得分，超过了先前的最新模型超过7.50％。代码将在https://github.com/xinyu-laura/mmuda上公开提供。

Autonomous vehicles utilize urban scene segmentation to understand the real world like a human and react accordingly. Semantic segmentation of normal scenes has experienced a remarkable rise in accuracy on conventional benchmarks. However, a significant portion of real-life accidents features abnormal scenes, such as those with object deformations, overturns, and unexpected traffic behaviors. Since even small mis-segmentation of driving scenes can lead to serious threats to human lives, the robustness of such models in accident scenarios is an extremely important factor in ensuring safety of intelligent transportation systems. In this paper, we propose a Multi-source Meta-learning Unsupervised Domain Adaptation (MMUDA) framework, to improve the generalization of segmentation transformers to extreme accident scenes. In MMUDA, we make use of Multi-Domain Mixed Sampling to augment the images of multiple-source domains (normal scenes) with the target data appearances (abnormal scenes). To train our model, we intertwine and study a meta-learning strategy in the multi-source setting for robustifying the segmentation results. We further enhance the segmentation backbone (SegFormer) with a HybridASPP decoder design, featuring large window attention spatial pyramid pooling and strip pooling, to efficiently aggregate long-range contextual dependencies. Our approach achieves a mIoU score of 46.97% on the DADA-seg benchmark, surpassing the previous state-of-the-art model by more than 7.50%. Code will be made publicly available at https://github.com/xinyu-laura/MMUDA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题