图像异常定位的蒙版变压器

论文标题

图像异常定位的蒙版变压器

Masked Transformer for image Anomaly Localization

论文作者

De Nardin, Axel, Mishra, Pankaj, Foresti, Gian Luca, Piciarelli, Claudio

论文摘要

图像异常检测包括检测与数据集中大多数样本在视觉上不同的图像或图像部分。对于各种现实生活中的应用，诸如生物医学图像分析，工业生产，银行业，交通管理等的视觉检查等各种现实应用的任务至关重要。当前的大多数深度学习方法都依赖图像重建：输入图像投影在某些潜在空间中，然后重建，然后重新构建，假设网络（主要是在正常数据上受过培训）将无法重新建造一个阶段。但是，这个假设并不总是存在。因此，我们提出了一个基于贴片掩模的视觉变压器体系结构的新模型：输入图像在几个补丁中分配，并且每个贴片仅从周围的数据中重建，从而忽略了补丁本身中包含的潜在异常信息。然后，我们表明，与传统的方形贴片的独家使用相比，多分辨率的补丁及其集体嵌入可为模型的性能提供了很大的改进。提出的模型已在流行的异常检测数据集（如MVTEC和Head CT）上进行了测试，与其他最先进的方法相比，它取得了良好的结果。

Image anomaly detection consists in detecting images or image portions that are visually different from the majority of the samples in a dataset. The task is of practical importance for various real-life applications like biomedical image analysis, visual inspection in industrial production, banking, traffic management, etc. Most of the current deep learning approaches rely on image reconstruction: the input image is projected in some latent space and then reconstructed, assuming that the network (mostly trained on normal data) will not be able to reconstruct the anomalous portions. However, this assumption does not always hold. We thus propose a new model based on the Vision Transformer architecture with patch masking: the input image is split in several patches, and each patch is reconstructed only from the surrounding data, thus ignoring the potentially anomalous information contained in the patch itself. We then show that multi-resolution patches and their collective embeddings provide a large improvement in the model's performance compared to the exclusive use of the traditional square patches. The proposed model has been tested on popular anomaly detection datasets such as MVTec and head CT and achieved good results when compared to other state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题