WITT：用于语义通信的无线图像传输变压器

论文标题

WITT：用于语义通信的无线图像传输变压器

WITT: A Wireless Image Transmission Transformer for Semantic Communications

论文作者

Yang, Ke, Wang, Sixian, Dai, Jincheng, Tan, Kailin, Niu, Kai, Zhang, Ping

论文摘要

在本文中，我们旨在重新设计视觉变压器（VIT）作为实现语义图像传输的新骨干，称为无线图像传输变压器（WITT）。以前的作品建立在卷积神经网络（CNN）基于捕获全球依赖性效率低下的卷积神经网络（CNN），导致端到端传输性能降低，尤其是对于高分辨率图像。为了解决这个问题，拟议的Witt使用Swin Transformers作为一种更有能力提取远程信息的骨干。与图像分类任务的VIT不同，WITT在考虑无线通道的效果的同时，高度优化了图像传输。具体而言，我们提出了一个空间调制模块，以根据通道状态信息扩展潜在表示，从而增强了单个模型处理各种通道条件的能力。结果，广泛的实验验证了我们的WITT在不同的图像分辨率，失真指标和渠道条件方面的性能更好。该代码可在https://github.com/keyang8/witt上找到。

In this paper, we aim to redesign the vision Transformer (ViT) as a new backbone to realize semantic image transmission, termed wireless image transmission transformer (WITT). Previous works build upon convolutional neural networks (CNNs), which are inefficient in capturing global dependencies, resulting in degraded end-to-end transmission performance especially for high-resolution images. To tackle this, the proposed WITT employs Swin Transformers as a more capable backbone to extract long-range information. Different from ViTs in image classification tasks, WITT is highly optimized for image transmission while considering the effect of the wireless channel. Specifically, we propose a spatial modulation module to scale the latent representations according to channel state information, which enhances the ability of a single model to deal with various channel conditions. As a result, extensive experiments verify that our WITT attains better performance for different image resolutions, distortion metrics, and channel conditions. The code is available at https://github.com/KeYang8/WITT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题