事件变压器

论文标题

Event Transformer

论文作者

Jiang, Bin, Li, Zhihao, Asif, M. Salman, Cao, Xun, Ma, Zhan

论文摘要

活动相机的低功耗和捕获微秒亮度变化的能力使其对各种计算机视觉任务有吸引力。现有事件表示方法通常将事件转换为深神经网络（DNNS）的框架，体素电网或峰值。但是，这些方法通常会牺牲时间颗粒状或需要专门的设备进行处理。这项工作介绍了基于令牌的新事件表示，每个事件都被视为一个基本处理单元，称为事件。这种方法在事件级别保留了序列复杂的时空属性。此外，我们在事件变压器块（ETB）中提出了一种三向注意机制，以协作构建事件之间的时间和空间相关性。我们将我们提出的基于令牌的事件表示形式与其他普遍的对象分类和光流估计的方法进行了比较。实验结果展示了其竞争性能，同时要求在标准设备上最少的计算资源。我们的代码可在\ url {https://github.com/njuvision/eventtransformer}上公开访问。

The event camera's low power consumption and ability to capture microsecond brightness changes make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs). However, these approaches often sacrifice temporal granularity or require specialized devices for processing. This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token. This approach preserves the sequence's intricate spatiotemporal attributes at the event level. Moreover, we propose a Three-way Attention mechanism in the Event Transformer Block (ETB) to collaboratively construct temporal and spatial correlations between events. We compare our proposed token-based event representation extensively with other prevalent methods for object classification and optical flow estimation. The experimental results showcase its competitive performance while demanding minimal computational resources on standard devices. Our code is publicly accessible at \url{https://github.com/NJUVISION/EventTransformer}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题