通过多方面的注意来增加人群的数量

论文标题

通过多方面的注意来增加人群的数量

Boosting Crowd Counting via Multifaceted Attention

论文作者

Lin, Hui, Ma, Zhiheng, Ji, Rongrong, Wang, Yaowei, Hong, Xiaopeng

论文摘要

本文重点介绍了具有挑战性的人群计数任务。由于人群图像中经常存在大规模的变化，因此CNN的固定尺寸卷积内核和最近视觉变形金刚的固定大小的关注都无法很好地处理这种变化。为了解决这个问题，我们提出了一个多面注意网络（MAN），以改善局部空间关系编码中的变压器模型。 Man将Vanilla Transformer的全球关注，可学习的本地关注和实例关注纳入计数模型。首先，提出了局部可学习的区域注意（LRA），以动态为每个特征位置分配注意力。其次，我们设计了当地注意的正规化，以最大程度地减少不同特征位置的注意力之间的偏差来监督LRA的训练。最后，我们提供了一种实例注意机制，可以在训练过程中动态地专注于最重要的实例。对四个挑战的人群计数数据集进行了广泛的实验，即上海，UCF-QNRF，JHU ++和NWPU验证了所提出的方法。代码：https：//github.com/loralinh/boosting-crosting-counting-counting-via-multifaceted-comentention。

This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variation. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. MAN incorporates global attention from a vanilla transformer, learnable local attention, and instance attention into a counting model. Firstly, the local Learnable Region Attention (LRA) is proposed to assign attention exclusively for each feature location dynamically. Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations. Finally, we provide an Instance Attention mechanism to focus on the most important instances dynamically during training. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and NWPU have validated the proposed method. Codes: https://github.com/LoraLinH/Boosting-Crowd-Counting-via-Multifaceted-Attention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题