论文标题
Siamese-NAS:有效地使用训练有素的样品来通过先验知识找到轻量级的神经体系结构
Siamese-NAS: Using Trained Samples Efficiently to Find Lightweight Neural Architecture by Prior Knowledge
论文作者
论文摘要
在过去的十年中,许多卷积神经网络的体系结构都是由手工制作设计的,例如VGG16,Resnet,Densenet等。它们都在当时的不同任务上达到了最新的水平。但是,它仍然依赖于人类的直觉和经验,并且在反复试验中还需要太多时间消耗。神经体系结构搜索(NAS)着重于此问题。在最近的作品中,由于培训架构作为培训样本,神经预测因子已大大改善。但是,采样效率已经很大。在本文中,我们提出的暹罗申诉人的灵感来自于基于预测的NAS的过去作品。它是由提议的估计代码构建的,这是有关培训程序的先验知识。拟议中的暹罗总统从这个想法中获得了巨大的好处。这个想法使其超过了NASBENCH-2010上的当前SOTA预测因子。为了探索估计代码的影响,我们分析了IT与准确性之间的关系。我们还为轻质CNN体系结构提出了搜索空间小型纳米座。这个精心设计的搜索空间更容易找到比NASBENCH-201的更好的失败的更好的架构。总而言之,拟议的暹罗预测者是基于预测的NAS。它达到了SOTA级别,尤其是计算预算有限。它应用于提议的小型纳米座,只能使用一些训练有素的样品来找到极轻的CNN体系结构。
In the past decade, many architectures of convolution neural networks were designed by handcraft, such as Vgg16, ResNet, DenseNet, etc. They all achieve state-of-the-art level on different tasks in their time. However, it still relies on human intuition and experience, and it also takes so much time consumption for trial and error. Neural Architecture Search (NAS) focused on this issue. In recent works, the Neural Predictor has significantly improved with few training architectures as training samples. However, the sampling efficiency is already considerable. In this paper, our proposed Siamese-Predictor is inspired by past works of predictor-based NAS. It is constructed with the proposed Estimation Code, which is the prior knowledge about the training procedure. The proposed Siamese-Predictor gets significant benefits from this idea. This idea causes it to surpass the current SOTA predictor on NASBench-201. In order to explore the impact of the Estimation Code, we analyze the relationship between it and accuracy. We also propose the search space Tiny-NanoBench for lightweight CNN architecture. This well-designed search space is easier to find better architecture with few FLOPs than NASBench-201. In summary, the proposed Siamese-Predictor is a predictor-based NAS. It achieves the SOTA level, especially with limited computation budgets. It applied to the proposed Tiny-NanoBench can just use a few trained samples to find extremely lightweight CNN architecture.