朝向流媒体感知

论文标题

朝向流媒体感知

Towards Streaming Perception

论文作者

Li, Mengtian, Wang, Yu-Xiong, Ramanan, Deva

论文摘要

体现的感知是指自治药物感知其环境的能力，以便它可以（重新）行为。代理的响应能力在很大程度上受其处理管道的延迟。尽管过去的工作已经研究了延迟和准确性之间的算法权衡，但尚无明确的指标来比较沿帕累托最佳潜伏期 - 精度曲线的不同方法。我们指出标准离线评估与实时应用程序之间的差异：到算法完成处理特定框架时，周围世界已经改变。对于这些目的，我们提出了一种将延迟和准确性一致地整合到一个实时在线感知的单个指标中，我们称之为“流准确度”。该指标背后的关键见解是每次瞬间共同评估整个感知堆栈的输出，迫使堆栈考虑在计算发生时应忽略的流数据量。更广泛地说，在该指标的基础上，我们引入了一个元基准测试，该基准系统将任何单帧任务转换为流知识任务。我们专注于城市视频流中对象检测和实例细分的说明任务，并贡献具有高质量和时间致密注释的新型数据集。我们提出的解决方案及其经验分析表明了许多令人惊讶的结论：（1）存在最佳的“最佳点”，可以最大程度地提高流盘准确性，沿帕累托最佳的最佳潜伏期曲线，（2）异步跟踪的跟踪和未来的预测，可以自然地呈现出临时的表述，以逐步播放，超出了暂时的范围。矛盾的结果有时会通过闲置而“无所事事”来最大程度地减少潜伏期。

Embodied perception refers to the ability of an autonomous agent to perceive its environment so that it can (re)act. The responsiveness of the agent is largely governed by latency of its processing pipeline. While past work has studied the algorithmic trade-off between latency and accuracy, there has not been a clear metric to compare different methods along the Pareto optimal latency-accuracy curve. We point out a discrepancy between standard offline evaluation and real-time applications: by the time an algorithm finishes processing a particular frame, the surrounding world has changed. To these ends, we present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception, which we refer to as "streaming accuracy". The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant, forcing the stack to consider the amount of streaming data that should be ignored while computation is occurring. More broadly, building upon this metric, we introduce a meta-benchmark that systematically converts any single-frame task into a streaming perception task. We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations. Our proposed solutions and their empirical analysis demonstrate a number of surprising conclusions: (1) there exists an optimal "sweet spot" that maximizes streaming accuracy along the Pareto optimal latency-accuracy curve, (2) asynchronous tracking and future forecasting naturally emerge as internal representations that enable streaming perception, and (3) dynamic scheduling can be used to overcome temporal aliasing, yielding the paradoxical result that latency is sometimes minimized by sitting idle and "doing nothing".

下载PDF全文

下载文献需遵守相关版权规定

论文标题