论文标题
LTC-SUM:使用2D CNN轻量化客户端驱动的个性化视频摘要框架
LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN
论文作者
论文摘要
本文提出了一个新颖的轻质缩略图集装箱摘要(LTC-SUM)框架,用于完整的长度视频。该框架通过使用最终用户设备的计算资源,为并发用户生成个性化的键跑摘要。获取和处理整个视频数据以生成视频摘要的最新方法在高度计算上。在这方面,提出的LTC-SUM方法使用轻巧的缩略图来处理检测事件的复杂过程。这大大降低了计算复杂性,并通过解决资源受限的最终用户设备中的计算和隐私瓶颈来提高通信和存储效率。这些改进是通过设计轻巧的2D CNN模型来从缩略图中提取功能来实现的,该模型仅帮助选择和检索少数特定段。在一组完整的18个特征长度视频(约32.9 h)上进行的广泛定量实验表明,该方法比同一最终用户设备配置上的最新方法相比,该方法在计算上具有显着效率。对56名参与者的结果的共同定性评估表明,参与者对使用拟议方法产生的摘要给出了更高的评分。据我们所知,这是使用缩略图容器用于功能长度视频的首次尝试设计完全由客户端驱动的个性化的键换视频摘要框架。
This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensive. In this regard, the proposed LTC-SUM method uses lightweight thumbnails to handle the complex process of detecting events. This significantly reduces computational complexity and improves communication and storage efficiency by resolving computational and privacy bottlenecks in resource-constrained end-user devices. These improvements were achieved by designing a lightweight 2D CNN model to extract features from thumbnails, which helped select and retrieve only a handful of specific segments. Extensive quantitative experiments on a set of full 18 feature-length videos (approximately 32.9 h in duration) showed that the proposed method is significantly computationally efficient than state-of-the-art methods on the same end-user device configurations. Joint qualitative assessments of the results of 56 participants showed that participants gave higher ratings to the summaries generated using the proposed method. To the best of our knowledge, this is the first attempt in designing a fully client-driven personalized keyshot video summarization framework using thumbnail containers for feature-length videos.