通过直接梯度调制平衡的多模式学习

论文标题

通过直接梯度调制平衡的多模式学习

Balanced Multimodal Learning via On-the-fly Gradient Modulation

论文作者

Peng, Xiaokang, Wei, Yake, Deng, Andong, Wang, Dong, Hu, Di

论文摘要

多模式学习通过整合不同的感官有助于全面理解世界。因此，预计多种输入方式将提高模型性能，但实际上，即使多模型模型的表现都超过其单模式对应物，也没有完全利用它们。具体而言，在本文中，我们指出，现有的多模式判别模型，其中均匀目标是为所有模式而设计的，可以保持不佳的单调表示，这是由于某些场景中的另一种主导方式引起的，例如，在某些情况下，在某些情况下，在吹风中的视野，绘制图像的视野等方面的声音，以减轻绘制图像的启发等级，我们可以通过逐步调整逐渐逐渐逐步调整，从而通过逐渐逐渐逐渐逐渐逐渐逐渐逐渐逐渐逐渐逐渐逐步控制，从而通过逐步调整，以逐步调整，从而使您进行了调整，从而使您可以通过逐步调整，从而逐渐逐渐逐渐逐渐逐渐逐渐逐渐逐渐逐步调整。监视他们对学习目标的贡献的差异。此外，引入了动态变化的额外的高斯噪声，以避免梯度调制引起的可能的概括下降。结果，我们对不同多模式任务的常见融合方法实现了可观的改进，这种简单的策略也可以促进现有的多模式方法，这说明了其功效和多功能性。源代码可在\ url {https://github.com/gewu-lab/ogm-ge_cvpr2r2022}获得。

Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even when the multimodal model outperforms its uni-modal counterpart. Specifically, in this paper we point out that existing multimodal discriminative models, in which uniform objective is designed for all modalities, could remain under-optimized uni-modal representations, caused by another dominated modality in some scenarios, e.g., sound in blowing wind event, vision in drawing picture event, etc. To alleviate this optimization imbalance, we propose on-the-fly gradient modulation to adaptively control the optimization of each modality, via monitoring the discrepancy of their contribution towards the learning objective. Further, an extra Gaussian noise that changes dynamically is introduced to avoid possible generalization drop caused by gradient modulation. As a result, we achieve considerable improvement over common fusion methods on different multimodal tasks, and this simple strategy can also boost existing multimodal methods, which illustrates its efficacy and versatility. The source code is available at \url{https://github.com/GeWu-Lab/OGM-GE_CVPR2022}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题