论文标题
剥离合并:重新思考场景解析的空间合并
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing
论文作者
论文摘要
事实证明,空间池在捕获远程上下文信息方面已被证明是高效的。在本文中,除了通常具有常规形状的NXN形状的常规空间合并外,我们还通过引入一种称为带状池的新池策略来重新考虑空间池的制定,该策略称为条形池,该策略考虑了长而狭窄的内核,即1xN或NX1。基于剥离池,我们进一步研究了空间合并体系结构设计,通过1)引入一个新的带状池模块,该模块使骨干网络能够有效地对长距离依赖性建模,2)提出一个新颖的构件,将空间池作为核心,以及3)系统地比较提议的带状池池的性能和传统的浮标和传统的杂物池。这两种新颖的基于合并的设计都是轻巧的,并且可以在现有场景解析网络中充当有效的插件模块。关于流行基准测试(例如ADE20K和CityScapes)的广泛实验表明,我们的简单方法建立了新的最新结果。代码可在https://github.com/andrew-qibin/spnet上提供。
Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. Based on strip pooling, we further investigate spatial pooling architecture design by 1) introducing a new strip pooling module that enables backbone networks to efficiently model long-range dependencies, 2) presenting a novel building block with diverse spatial pooling as a core, and 3) systematically comparing the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play module in existing scene parsing networks. Extensive experiments on popular benchmarks (e.g., ADE20K and Cityscapes) demonstrate that our simple approach establishes new state-of-the-art results. Code is made available at https://github.com/Andrew-Qibin/SPNet.