使用事件摄像机推动基于图的对象检测的极限

论文标题

使用事件摄像机推动基于图的对象检测的极限

Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras

论文作者

Gehrig, Daniel, Scaramuzza, Davide

论文摘要

事件摄像机的最先进的机器学习方法将事件视为密集的表示，并使用常规的深神经网络处理。因此，他们无法维持事件数据的稀疏性和异步性质，从而对下游系统施加了重大的计算和延迟约束。最近的一系列工作通过将事件建模为时空发展的图来解决此问题，这些图形可以通过图神经网络有效地和异步处理。这些作品显示出令人印象深刻的计算降低，但是它们的准确性仍然受到网络的小规模和浅层深度的限制，这两者都是减少计算所需的。在这项工作中，我们通过引入几种架构选择来打破玻璃天花板，这些选择使我们能够在保持低计算的同时扩展此类模型的深度和复杂性。在对象检测任务上，我们最小的模型显示了降低的3.7倍，而在7.4 MAP上的表现优于最新的异步方法。即使缩放到更大的型号尺寸，我们的效率也比最先进的效率高13％，同时比11.5地图的效率高。结果，我们的方法的运行速度比密集的图神经网络快3.7倍，每个正向通行证仅为8.4 ms。这为边缘案例场景中有效且准确的对象检测打开了大门。

State-of-the-art machine-learning methods for event cameras treat events as dense representations and process them with conventional deep neural networks. Thus, they fail to maintain the sparsity and asynchronous nature of event data, thereby imposing significant computation and latency constraints on downstream systems. A recent line of work tackles this issue by modeling events as spatiotemporally evolving graphs that can be efficiently and asynchronously processed using graph neural networks. These works showed impressive computation reductions, yet their accuracy is still limited by the small scale and shallow depth of their network, both of which are required to reduce computation. In this work, we break this glass ceiling by introducing several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. On object detection tasks, our smallest model shows up to 3.7 times lower computation, while outperforming state-of-the-art asynchronous methods by 7.4 mAP. Even when scaling to larger model sizes, we are 13% more efficient than state-of-the-art while outperforming it by 11.5 mAP. As a result, our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass. This opens the door to efficient, and accurate object detection in edge-case scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题