探索手术语义场景细分中的内部和间视频间关系

论文标题

探索手术语义场景细分中的内部和间视频间关系

Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation

论文作者

Jin, Yueming, Yu, Yang, Chen, Cheng, Zhao, Zixu, Heng, Pheng-Ann, Stoyanov, Danail

论文摘要

自动手术场景细分是促进现代手术剧院认知智能的基础。以前的作品依赖于常规的聚合模块（例如扩张的卷积，卷积LSTM），该模块仅利用局部上下文。在本文中，我们提出了一个新颖的框架stswincl，该框架通过逐步捕获全球环境，探讨了互补的视频内和视频间关系以提高细分性能。我们首先开发了一个层次结构变压器，以捕获视频内关系，其中包括来自邻居像素和先前帧的富裕空间和时间提示。提出了一个联合时空窗口移动方案，以有效地将这两个线索聚集到每个像素嵌入中。然后，我们通过像素到像素对比度学习探索视频间的关系，该学习很好地结构了整体嵌入空间。开发了一个多源对比度训练目标，可以将视频中的像素嵌入与基础真相指导分组，这对于学习整个数据的全球属性至关重要。我们在两个公共外科视频基准测试中广泛验证了我们的方法，包括Endovis18 Challenge和Cadis数据集。实验结果表明，我们的方法的表现始终超过先前的最新方法。代码可在https://github.com/yuemingjin/stswincl上找到。

Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre. Previous works rely on conventional aggregation modules (e.g., dilated convolution, convolutional LSTM), which only make use of the local context. In this paper, we propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance, by progressively capturing the global context. We firstly develop a hierarchy Transformer to capture intra-video relation that includes richer spatial and temporal cues from neighbor pixels and previous frames. A joint space-time window shift scheme is proposed to efficiently aggregate these two cues into each pixel embedding. Then, we explore inter-video relation via pixel-to-pixel contrastive learning, which well structures the global embedding space. A multi-source contrast training objective is developed to group the pixel embeddings across videos with the ground-truth guidance, which is crucial for learning the global property of the whole data. We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches. Code is available at https://github.com/YuemingJin/STswinCL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题