VI-I2R团队关于Epic-Kitchens-100无监督域改编挑战的技术报告2021

论文标题

VI-I2R团队关于Epic-Kitchens-100无监督域改编挑战的技术报告2021

Team VI-I2R Technical Report on EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021

论文作者

Cheng, Yi, Fang, Fen, Sun, Ying

论文摘要

在本报告中，我们介绍了我们对Epic-Kitchens-100无监督域适应（UDA）挑战的技术细节。 Epic-kitchens-100数据集由日常厨房活动组成，重点是人类手与周围物体之间的相互作用。由于存在分散注意力的对象和视觉上相似的动作类别，尤其是在未标记的目标域中，因此准确地认识到这些细粒度的活动非常具有挑战性。基于现有的视频域适应方法，即TA3N，我们建议通过利用手工边界框信息来学习以手动为中心的功能，以获取UDA的细粒度动作识别。这有助于减少背景的干扰，并有助于学习域不变特征。为了实现高质量的手段定位，我们采用了不确定性感知的域适应网络，即MEAA来训练域自适应手动检测器，该域仅使用源域中的手工边界框注释非常有限，但可以很好地推广到未标记的目标域。我们的提交仅使用RGB和光流方式作为输入，就TOP-1动作识别精度获得了第一名。

In this report, we present the technical details of our approach to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation (UDA) Challenge for Action Recognition. The EPIC-KITCHENS-100 dataset consists of daily kitchen activities focusing on the interaction between human hands and their surrounding objects. It is very challenging to accurately recognize these fine-grained activities, due to the presence of distracting objects and visually similar action classes, especially in the unlabelled target domain. Based on an existing method for video domain adaptation, i.e., TA3N, we propose to learn hand-centric features by leveraging the hand bounding box information for UDA on fine-grained action recognition. This helps reduce the distraction from background as well as facilitate the learning of domain-invariant features. To achieve high quality hand localization, we adopt an uncertainty-aware domain adaptation network, i.e., MEAA, to train a domain-adaptive hand detector, which only uses very limited hand bounding box annotations in the source domain but can generalize well to the unlabelled target domain. Our submission achieved the 1st place in terms of top-1 action recognition accuracy, using only RGB and optical flow modalities as input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题