论文标题
SAVCHOI:使用与人类物体互动的密集视频字幕检测可疑活动
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions
论文作者
论文摘要
在监视视频中检测可疑活动是实时监视中的一个长期问题,这导致了检测犯罪的困难。因此,我们提出了一种在监视视频中检测和总结可疑活动的新颖方法。我们还为UCF-Crime视频数据集创建了地面真相摘要。我们通过利用双模式变压器中视觉特征的人类对象相互作用(HOI)模型来修改此任务的预先存在方法。此外,我们针对活动网络字幕数据集的密集视频字幕任务验证了现有的最新算法。我们观察到,该密集字幕的这种表述的性能要比其他讨论的基于BMT的方法@1,Bleu@2,Bleu@3,Bleu@4和流星。我们进一步对数据集和模型进行比较分析,以根据不同的NMS阈值报告发现(使用遗传算法搜索)。在这里,我们的公式优于BLEU@1,BLEU@2,BLEU@3和BLEU@4的大多数模型,而流星仅少于Adv-Inf Global的所有模型,分别为25%和0.5%。
Detecting suspicious activities in surveillance videos is a longstanding problem in real-time surveillance that leads to difficulties in detecting crimes. Hence, we propose a novel approach for detecting and summarizing suspicious activities in surveillance videos. We have also created ground truth summaries for the UCF-Crime video dataset. We modify a pre-existing approach for this task by leveraging the Human-Object Interaction (HOI) model for the Visual features in the Bi-Modal Transformer. Further, we validate our approach against the existing state-of-the-art algorithms for the Dense Video Captioning task for the ActivityNet Captions dataset. We observe that this formulation for Dense Captioning performs significantly better than other discussed BMT-based approaches for BLEU@1, BLEU@2, BLEU@3, BLEU@4, and METEOR. We further perform a comparative analysis of the dataset and the model to report the findings based on different NMS thresholds (searched using Genetic Algorithms). Here, our formulation outperforms all the models for BLEU@1, BLEU@2, BLEU@3, and most models for BLEU@4 and METEOR falling short of only ADV-INF Global by 25% and 0.5%, respectively.