通过自适应模型流上边缘设备上的实时视频推断

论文标题

通过自适应模型流上边缘设备上的实时视频推断

Real-Time Video Inference on Edge Devices via Adaptive Model Streaming

论文作者

Khani, Mehrdad, Hamadanian, Pouya, Nasr-Esfahany, Arash, Alizadeh, Mohammad

论文摘要

由于深度神经网络的高计算成本，移动电话和无人机等边缘设备等边缘设备上的实时视频推断非常具有挑战性。我们提出了自适应模型流（AMS），这是一种新的方法，用于提高有效的轻巧模型的性能，以用于边缘设备上的视频推断。 AMS使用远程服务器不断训练和调整在Edge设备上运行的小型型号，从而使用来自大型，最先进的模型的在线知识蒸馏在实时视频上提高了其性能。我们讨论了视频推理的网络模型改编的挑战，并提出了几种方法，以降低这种方法的沟通成本：避免过度拟合，更新重要的模型参数的一小部分以及对边缘设备上培训框架的自适应采样。在视频语义细分的任务上，我们的实验结果显示，与几个视频数据集的预训练模型相比，相比之下，平均跨工会相交的平均相交相互作用为0.4--17.8％。我们的原型可以在三星Galaxy S10+手机上使用40毫秒的摄像头到标签延迟，以每秒30帧的速度进行视频分割，并在设备上使用少于300 kbps的上行链路链路链路和下行链路带宽。

Real-time video inference on edge devices like mobile phones and drones is challenging due to the high computation cost of Deep Neural Networks. We present Adaptive Model Streaming (AMS), a new approach to improving performance of efficient lightweight models for video inference on edge devices. AMS uses a remote server to continually train and adapt a small model running on the edge device, boosting its performance on the live video using online knowledge distillation from a large, state-of-the-art model. We discuss the challenges of over-the-network model adaptation for video inference, and present several techniques to reduce communication cost of this approach: avoiding excessive overfitting, updating a small fraction of important model parameters, and adaptive sampling of training frames at edge devices. On the task of video semantic segmentation, our experimental results show 0.4--17.8 percent mean Intersection-over-Union improvement compared to a pre-trained model across several video datasets. Our prototype can perform video segmentation at 30 frames-per-second with 40 milliseconds camera-to-label latency on a Samsung Galaxy S10+ mobile phone, using less than 300 Kbps uplink and downlink bandwidth on the device.

下载PDF全文

下载文献需遵守相关版权规定

论文标题