视频胶囊内窥镜分类使用焦点调制引导卷积神经网络

论文标题

视频胶囊内窥镜分类使用焦点调制引导卷积神经网络

Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network

论文作者

Srivastava, Abhishek, Tomar, Nikhil Kumar, Bagci, Ulas, Jha, Debesh

论文摘要

视频胶囊内窥镜检查是计算机视觉和医学的热门话题。深度学习会对视频胶囊内窥镜技术的未来产生积极影响。它可以提高异常检测率，减少医生的筛查时间并有助于实际临床分析。视频胶囊内窥镜检查的CADX分类系统已显示出进一步改进的巨大希望。例如，检测癌性息肉和出血会导致快速的医疗反应并提高患者的存活率。为此，自动化的CADX系统必须具有较高的吞吐量和良好的精度。在本文中，我们提出了焦点convnet，这是一个与轻量级卷积层集成的焦点调制网络，用于分类小肠解剖标志和腔内发现。 FocalConvnet利用焦点调制来实现全球环境，并允许在整个前向通过。此外，具有固有的电感/学习偏差和提取层次特征的能力的卷积块使我们的焦点concalconvnet能够获得高吞吐量的有利结果。我们将焦点vnet与Kvasir-Capsule上的其他SOTA进行了比较，Kvasir-Capsule是一个具有44,228帧的大规模VCE数据集，具有13类不同的异常。我们提出的方法分别超过了其他SOTA方法论的加权F1得分，回忆和MCC}分别超过了其他SOTA方法。此外，我们报告了在实时临床环境中确定焦距的148.02图像/秒速率的最高吞吐量。建议的focalConvnet的代码可在https://github.com/noviceman-prog/focalconvnet上找到。

Video capsule endoscopy is a hot topic in computer vision and medicine. Deep learning can have a positive impact on the future of video capsule endoscopy technology. It can improve the anomaly detection rate, reduce physicians' time for screening, and aid in real-world clinical analysis. CADx classification system for video capsule endoscopy has shown a great promise for further improvement. For example, detection of cancerous polyp and bleeding can lead to swift medical response and improve the survival rate of the patients. To this end, an automated CADx system must have high throughput and decent accuracy. In this paper, we propose FocalConvNet, a focal modulation network integrated with lightweight convolutional layers for the classification of small bowel anatomical landmarks and luminal findings. FocalConvNet leverages focal modulation to attain global context and allows global-local spatial interactions throughout the forward pass. Moreover, the convolutional block with its intrinsic inductive/learning bias and capacity to extract hierarchical features allows our FocalConvNet to achieve favourable results with high throughput. We compare our FocalConvNet with other SOTA on Kvasir-Capsule, a large-scale VCE dataset with 44,228 frames with 13 classes of different anomalies. Our proposed method achieves the weighted F1-score, recall and MCC} of 0.6734, 0.6373 and 0.2974, respectively outperforming other SOTA methodologies. Furthermore, we report the highest throughput of 148.02 images/second rate to establish the potential of FocalConvNet in a real-time clinical environment. The code of the proposed FocalConvNet is available at https://github.com/NoviceMAn-prog/FocalConvNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题