共同信息学习分类器：培训深度学习分类系统的信息理论观点

论文标题

共同信息学习分类器：培训深度学习分类系统的信息理论观点

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

论文作者

Yi, Jirong, Zhang, Qiaosheng, Chen, Zhen, Liu, Qiao, Shao, Wei

论文摘要

据报道，深度学习系统是在许多应用程序中的出现最新性能，而实现这一目标的关键之一是在基准数据集中存在训练有素的分类器，在下游任务中可以用作骨干萃取器。作为训练深神经网络（DNN）分类器的主流损失函数，跨熵损失很容易导致我们找到模型，这些模型在没有使用其他技术来减轻数据（例如数据增强）时表现出严重的过度拟合行为。在本文中，我们证明了训练DNN分类器的现有跨熵损失最小化基本上了解了数据集的基础数据分布的条件熵，即在显示输入后，标签中保留了信息或不确定性。在本文中，我们提出了一个共同的信息学习框架，我们通过在标签和输入之间学习相互信息来培训DNN分类器。从理论上讲，我们给出人口误差概率在相互信息方面的下限。此外，我们在$ \ mbr^n $中的混凝土二进制分类数据模型的下限和上限得出了相互信息，在这种情况下，误差概率下限。此外，我们建立了样本复杂性，以准确地从基础数据分布中得出的经验数据样本中准确学习相互信息。从经验上讲，我们在几个基准数据集上进行了广泛的实验，以支持我们的理论。没有哨声和铃铛，拟议中的相互信息学习的分类器（MILC）的概括性能要比最先进的分类器更好，而最先进的分类器在测试准确性方面可能超过10 \％。

Deep learning systems have been reported to acheive state-of-the-art performances in many applications, and one of the keys for achieving this is the existence of well trained classifiers on benchmark datasets which can be used as backbone feature extractors in downstream tasks. As a main-stream loss function for training deep neural network (DNN) classifiers, the cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior when no other techniques are used for alleviating it such as data augmentation. In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution of the dataset, i.e., the information or uncertainty remained in the labels after revealing the input. In this paper, we propose a mutual information learning framework where we train DNN classifiers via learning the mutual information between the label and input. Theoretically, we give the population error probability lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mbR^n$, and also the error probability lower bound in this scenario. Besides, we establish the sample complexity for accurately learning the mutual information from empirical data samples drawn from the underlying data distribution. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. Without whistles and bells, the proposed mutual information learned classifiers (MILCs) acheive far better generalization performances than the state-of-the-art classifiers with an improvement which can exceed more than 10\% in testing accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题