论文标题
群集修剪:一种有效的Edge AI视觉应用程序的有效过滤器修剪方法
Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications
论文作者
论文摘要
即使卷积神经网络(CNN)在计算机视觉领域显示出卓越的结果,但在边缘实时实施计算机视觉算法仍然是一项艰巨的任务,尤其是由于CNN中的高内存消耗和计算复杂性,使用低成本的物联网设备。网络压缩方法(例如重量修剪,过滤器修剪和量化)用于克服上述问题。尽管与其他技术相比,过滤器修剪方法表现出更好的性能,但在CNN的不同层上修剪的过滤器数量的不规则性可能不符合大多数神经计算硬件架构。在本文中,已经提出了一种新颖的贪婪方法,称为群集修剪,该方法通过考虑过滤器的重要性和基础硬件体系结构,提供了一种结构化的方法来删除CNN中的过滤器。将所提出的方法与Pascal-Voc Open数据集上的常规过滤器修剪算法和头部计数数据集进行了比较,这是我们自己开发的数据集,用于检测和计算进入房间的人。我们使用流行的SSD-MobiLenet和SSD-Squeezenet神经网络体系结构,用于边缘视觉应用程序。结果表明,使用上述硬件体系结构上的两个数据集,我们的方法优于常规过滤器修剪方法。此外,提出了由Intel Movidius-NCS组成的低成本IoT硬件设置,该设置建议使用我们提出的修剪方法来部署Edge-ai应用程序。
Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quantization are used to overcome the above mentioned problem. Even though filter pruning methodology has shown better performances compared to other techniques, irregularity of the number of filters pruned across different layers of a CNN might not comply with majority of the neural computing hardware architectures. In this paper, a novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN by considering the importance of filters and the underlying hardware architecture. The proposed methodology is compared with the conventional filter pruning algorithm on Pascal-VOC open dataset, and Head-Counting dataset, which is our own dataset developed to detect and count people entering a room. We benchmark our proposed method on three hardware architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS) using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures used for edge-AI vision applications. Results demonstrate that our method outperforms the conventional filter pruning methodology, using both datasets on above mentioned hardware architectures. Furthermore, a low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.