论文标题
语义保存的通信系统,用于高效的语音传播
Semantic-preserved Communication System for Highly Efficient Speech Transmission
论文作者
论文摘要
近年来,已经探索了基于深度学习(DL)的语义交流方法,以有效地传输图像,文本和语音。与传统的无线通信方法相反,该方法着眼于抽象符号的传输,语义通信方法仅通过发送与源数据的语义相关信息来实现更好的传输效率。在本文中,我们考虑了以语义为导向的语音传输,该语音传输仅通过频道上的语义识别任务传输与语义相关的信息,并为语音重建任务提供了紧凑的附加语义 - IRRELERELERELERRELERVEL信息。我们提出了一种新型的基于DL的端到端收发器,该收发器从发射器处的输入语音频谱提取和编码语义信息,并从接收器的解码语义信息中输出相应的转录。对于语音传输的语音,我们进一步包括一个CTC对齐模块,该模块提取了少数其他的语义 - irrerelevant但与语音相关的信息,以更好地重建接收器的原始语音信号。仿真结果证实,我们提出的方法在预测文本中的精确性对文本传输的准确性以及恢复的语音信号的质量来胜过当前方法,以提高语音传播的语音信号,并显着提高了传输效率。更具体地说,所提出的方法仅发送现有方法所需的传输符号的16%,同时将语音降低约10%以减少到文本传输。对于语音传播的语音,它在传输效率方面的提高更加显着,仅为现有方法所需的传输符号的0.2%。
Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech transmission which transmits only the semantic-relevant information over the channel for the speech recognition task, and a compact additional set of semantic-irrelevant information for the speech reconstruction task. We propose a novel end-to-end DL-based transceiver which extracts and encodes the semantic information from the input speech spectrums at the transmitter and outputs the corresponding transcriptions from the decoded semantic information at the receiver. For the speech to speech transmission, we further include a CTC alignment module that extracts a small number of additional semantic-irrelevant but speech-related information for the better reconstruction of the original speech signals at the receiver. The simulation results confirm that our proposed method outperforms current methods in terms of the accuracy of the predicted text for the speech to text transmission and the quality of the recovered speech signals for the speech to speech transmission, and significantly improves transmission efficiency. More specifically, the proposed method only sends 16% of the amount of the transmitted symbols required by the existing methods while achieving about 10% reduction in WER for the speech to text transmission. For the speech to speech transmission, it results in an even more remarkable improvement in terms of transmission efficiency with only 0.2% of the amount of the transmitted symbols required by the existing method.