论文标题
与多模式人类活动识别的跨模式知识挖掘的对比学习
Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal Human Activity Recognition
论文作者
论文摘要
人类活动识别是一个研究领域,其中输入数据可以采取多种形式。每种可能的输入方式都以不同的方式描述了人类的行为,并且每个人都有其自己的优点和缺点。我们探讨了以下假设:利用多种方式可以带来更好的认识。由于输入数据的手动注释很昂贵且耗时,因此重点是自我监管的方法,可以学习有用的功能表示,而无需任何地面真相标签。我们扩展了许多最近的自我监督方法,以实现人类活动识别,利用惯性和骨骼数据的任务。此外,我们提出了一个灵活的通用框架,用于执行多模式自学学习,称为具有跨模式知识挖掘(CMC-CMKM)的对比度多视图编码。该框架利用了特定于模式的知识,以减轻典型的自我监督框架的局限性。在两个广泛使用的数据集上进行的广泛实验表明,在不同情况下,建议的框架显着优于对比的单峰和多模式基线,包括完全监督的微调,活动检索和半手不见的学习。此外,即使与受监督的方法相比,它也显示出竞争力。
Human Activity Recognition is a field of research where input data can take many forms. Each of the possible input modalities describes human behaviour in a different way, and each has its own strengths and weaknesses. We explore the hypothesis that leveraging multiple modalities can lead to better recognition. Since manual annotation of input data is expensive and time-consuming, the emphasis is made on self-supervised methods which can learn useful feature representations without any ground truth labels. We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition, leveraging inertial and skeleton data. Furthermore, we propose a flexible, general-purpose framework for performing multimodal self-supervised learning, named Contrastive Multiview Coding with Cross-Modal Knowledge Mining (CMC-CMKM). This framework exploits modality-specific knowledge in order to mitigate the limitations of typical self-supervised frameworks. The extensive experiments on two widely-used datasets demonstrate that the suggested framework significantly outperforms contrastive unimodal and multimodal baselines on different scenarios, including fully-supervised fine-tuning, activity retrieval and semi-supervised learning. Furthermore, it shows performance competitive even compared to supervised methods.