通过神经网络进行无约束的音频剪接检测和本地化

论文标题

通过神经网络进行无约束的音频剪接检测和本地化

Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks

论文作者

Moussa, Denise, Hirsch, Germans, Riess, Christian

论文摘要

免费可用且易于使用的音频编辑工具使执行音频剪接变得直接。可以通过结合同一人的各种语音样本来说服伪造。在考虑错误信息时，在公共部门又在法律背景下验证证据的完整性时，对此类拼接的检测都很重要。不幸的是，用于音频剪接的大多数现有检测算法都使用手工制作的功能并做出特定的假设。但是，刑事调查人员经常面临来自具有未知特征的无约束来源的音频样本，这增加了对更普遍适用的方法的需求。通过这项工作，我们的目标是朝着不受限制的音频剪接检测迈出第一步，以满足这一需求。我们以可能掩盖拼接的后处理操作的形式模拟各种攻击方案。我们提出了一个用于剪接检测和定位的变压器序列到序列（SEQ2SEQ）网络。我们的广泛评估表明，所提出的方法的表现优于剪接检测的现有专用方法[3，10]以及通用网络效率网络[28]和Regnet [25]。

Freely available and easy-to-use audio editing tools make it straightforward to perform audio splicing. Convincing forgeries can be created by combining various speech samples from the same person. Detection of such splices is important both in the public sector when considering misinformation, and in a legal context to verify the integrity of evidence. Unfortunately, most existing detection algorithms for audio splicing use handcrafted features and make specific assumptions. However, criminal investigators are often faced with audio samples from unconstrained sources with unknown characteristics, which raises the need for more generally applicable methods. With this work, we aim to take a first step towards unconstrained audio splicing detection to address this need. We simulate various attack scenarios in the form of post-processing operations that may disguise splicing. We propose a Transformer sequence-to-sequence (seq2seq) network for splicing detection and localization. Our extensive evaluation shows that the proposed method outperforms existing dedicated approaches for splicing detection [3, 10] as well as the general-purpose networks EfficientNet [28] and RegNet [25].

下载PDF全文

下载文献需遵守相关版权规定

论文标题