论文标题
VIDCEP:复杂的事件处理框架以检测视频流中时空模式
VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams
论文作者
论文摘要
视频数据具有很高的表现力,传统上对于机器来说很难解释。视频流的查询事件模式由于其非结构化表示而具有挑战性。中间件系统,例如数据流中的复杂事件处理(CEP)矿场模式,并及时向用户发送通知。当前的CEP系统由于其非结构化数据模型和缺乏表达性查询语言而对查询视频流有固有的局限性。在这项工作中,我们专注于CEP框架,用户可以通过视频定义高级表达性查询,以检测一系列时空事件模式。在这种情况下,我们提出:i)vidcep,即临时内存,接近实时复杂事件事件匹配视频流的框架。 The system uses a graph-based event representation for video streams which enables the detection of high-level semantic concepts from video using cascades of Deep Neural Network models, ii) a Video Event Query language (VEQL) to express high-level user queries for video streams in CEP, iii) a complex event matcher to detect spatiotemporal video event patterns by matching expressive user queries over video data.提出的方法检测到时空视频事件模式,F得分范围为0.66至0.89。 VIDCEP保持接近实时的性能,平均每秒70帧的平均吞吐量为5个并行视频,并具有次秒匹配的延迟。
Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Current CEP systems have inherent limitations to query video streams due to their unstructured data model and lack of expressive query language. In this work, we focus on a CEP framework where users can define high-level expressive queries over videos to detect a range of spatiotemporal event patterns. In this context, we propose: i) VidCEP, an in-memory, on the fly, near real-time complex event matching framework for video streams. The system uses a graph-based event representation for video streams which enables the detection of high-level semantic concepts from video using cascades of Deep Neural Network models, ii) a Video Event Query language (VEQL) to express high-level user queries for video streams in CEP, iii) a complex event matcher to detect spatiotemporal video event patterns by matching expressive user queries over video data. The proposed approach detects spatiotemporal video event patterns with an F-score ranging from 0.66 to 0.89. VidCEP maintains near real-time performance with an average throughput of 70 frames per second for 5 parallel videos with sub-second matching latency.