论文标题
动员:寻找移动加速器的对象检测体系结构
MobileDets: Searching for Object Detection Architectures for Mobile Accelerators
论文作者
论文摘要
基于深度卷积的倒置瓶颈层一直是移动设备上最先进的对象检测模型中的主要构建块。在这项工作中,我们通过重新审查常规卷积的实用性来研究这种设计模式在广泛的移动加速器上的最佳性。我们发现,定期卷积是一个有力的组成部分,可以提高加速器对象检测的潜伏期 - 准确性权衡,前提是它们通过神经体系结构搜索策略性地将其放置在网络中。通过在搜索空间中纳入常规卷积并直接优化网络架构以进行对象检测,我们获得了一个对象检测模型,动员组,该模型在移动加速器中实现最新结果。在可可对象检测任务上,Mobiledets在可比较的移动CPU推断潜伏期处的MobileNetV3+SSDLITE优于1.7地图。 Mobiledets还以1.9在移动CPU上的1.9映射,Google Edgetpu上的3.7 MAP,Qualcomm Hexagon DSP上的3.4 MAP和NVIDIA JETSON GPU上的2.7 MAP均优于MobilenetV2+SSDLITE,3.4地图,而无需增加延迟。此外,即使不使用特征金字塔,动员也可以与移动CPU上的最新MNASFPN相提并论,并在EdgetPus和DSP上获得更好的MAP分数,并具有高达2倍的速度。代码和模型可在TensorFlow对象检测API中可用:https://github.com/tensorflow/models/tree/master/master/research/object/object_detection。
Inverted bottleneck layers, which are built upon depthwise convolutions, have been the predominant building blocks in state-of-the-art object detection models on mobile devices. In this work, we investigate the optimality of this design pattern over a broad range of mobile accelerators by revisiting the usefulness of regular convolutions. We discover that regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators, provided that they are placed strategically in the network via neural architecture search. By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators. On the COCO object detection task, MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by 1.9 mAP on mobile CPUs, 3.7 mAP on Google EdgeTPU, 3.4 mAP on Qualcomm Hexagon DSP and 2.7 mAP on Nvidia Jetson GPU without increasing latency. Moreover, MobileDets are comparable with the state-of-the-art MnasFPN on mobile CPUs even without using the feature pyramid, and achieve better mAP scores on both EdgeTPUs and DSPs with up to 2x speedup. Code and models are available in the TensorFlow Object Detection API: https://github.com/tensorflow/models/tree/master/research/object_detection.