通过知识蒸馏改善神经ODE

论文标题

通过知识蒸馏改善神经ODE

Improving Neural ODEs via Knowledge Distillation

论文作者

Chu, Haoyu, Wei, Shikui, Lu, Qiming, Zhao, Yao

论文摘要

神经普通微分方程（神经ODES）使用神经网络指定的普通微分方程构建隐藏单元的连续动力学，从而在许多任务上证明了有希望的结果。但是，神经ODE在图像识别任务上仍然表现不佳。可能的原因是，在神经ODE中常用的单热编码向量无法提供足够的监督信息。我们提出了一种基于知识蒸馏的新培训，以构建更强大，更强大的神经ODES适合图像识别任务。特别是，我们将神经ODE的培训建模为教师学习过程，在该过程中，我们建议将重新连接作为教师模型，以提供更丰富的监督信息。实验结果表明，新的训练方式可以在CIFAR10上将神经ODE的分类准确性提高24％，而SVHN的分类准确性可以提高24％。此外，我们还定量地讨论了知识蒸馏和时间范围对神经odes的影响对对抗性例子的鲁棒性。实验分析得出的结论是，引入知识蒸馏并增加时间范围可以改善神经odes对对抗性例子的鲁棒性。

Neural Ordinary Differential Equations (Neural ODEs) construct the continuous dynamics of hidden units using ordinary differential equations specified by a neural network, demonstrating promising results on many tasks. However, Neural ODEs still do not perform well on image recognition tasks. The possible reason is that the one-hot encoding vector commonly used in Neural ODEs can not provide enough supervised information. We propose a new training based on knowledge distillation to construct more powerful and robust Neural ODEs fitting image recognition tasks. Specially, we model the training of Neural ODEs into a teacher-student learning process, in which we propose ResNets as the teacher model to provide richer supervised information. The experimental results show that the new training manner can improve the classification accuracy of Neural ODEs by 24% on CIFAR10 and 5% on SVHN. In addition, we also quantitatively discuss the effect of both knowledge distillation and time horizon in Neural ODEs on robustness against adversarial examples. The experimental analysis concludes that introducing the knowledge distillation and increasing the time horizon can improve the robustness of Neural ODEs against adversarial examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题