论文标题
从公正的差异中学到的可不同的感知音频度量
A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences
论文作者
论文摘要
许多音频处理任务需要感知评估。获得人类判断的``黄金标准''是耗时,昂贵的,不能用作优化标准。另一方面,自动指标有效地计算但通常与人类判断力较差,尤其是在人类检测阈值下的音频差异。在这项工作中,我们通过将深层神经网络拟合到众包人类判断的新大型数据集来构建指标。提示受试者回答一个直接,客观的问题:两个录音是否相同?这些对是在各种扰动下产生的算法,包括噪声,混响和压缩伪像。探测扰动空间的目的是有效识别受试者的差异差异(JND)水平。我们表明,由人类的判断良好地校准了所得的学习度量,表现优于基线方法。由于它是一个深层网络,因此该度量是可区分的,因此可以作为其他任务的损失函数。因此,如通过主观成对比较衡量的,只需用我们的度量替换现有的损失(例如,深度特征损失)就可以显着改善。
Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments. Subjects are prompted to answer a straightforward, objective question: are two recordings identical or not? These pairs are algorithmically generated under a variety of perturbations, including noise, reverb, and compression artifacts; the perturbation space is probed with the goal of efficiently identifying the just-noticeable difference (JND) level of the subject. We show that the resulting learned metric is well-calibrated with human judgments, outperforming baseline methods. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for other tasks. Thus, simply replacing an existing loss (e.g., deep feature loss) with our metric yields significant improvement in a denoising network, as measured by subjective pairwise comparison.