论文标题
Thales:DNN加速器的建筑脆弱性因素制定和估算
Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
论文作者
论文摘要
由于深度神经网络(DNN)越来越多地部署在安全和隐私敏感应用中,例如自主驾驶和生物识别认证,因此了解DNN的断层耐受性质至关重要。先前的工作主要集中于诸如时间(拟合)率的失败和无声数据损坏(SDC)率之类的指标,这些指标量化了设备故障的频率。取而代之的是,鉴于发生了瞬态误差,本文着重于量化DNN精度,这告诉我们当瞬态误差发生时网络的行为如何。我们称此指标弹性精度(RA)。我们表明,现有的RA公式从根本上是不准确的,因为它错误地假设软件变量(模型权重/激活)在硬件瞬态故障下具有相同的错误概率。我们提出了一种算法,该算法捕获了瞬态故障下DNN变量的错误概率,因此提供了由硬件验证的正确RA估计。为了加速RA估计,我们将RA计算作为蒙特卡洛整合问题进行重新制定,并使用由DNN特定的启发式方法驱动的重要性采样来解决它。使用我们的轻质RA估计方法,我们表明瞬态故障导致的准确性降解要比今天的DNN弹性工具估计值更高。我们展示了RA估计工具如何通过将其与网络体系结构搜索框架集成在一起来帮助设计更多的弹性DNN。
As Deep Neural Networks (DNNs) are increasingly deployed in safety critical and privacy sensitive applications such as autonomous driving and biometric authentication, it is critical to understand the fault-tolerance nature of DNNs. Prior work primarily focuses on metrics such as Failures In Time (FIT) rate and the Silent Data Corruption (SDC) rate, which quantify how often a device fails. Instead, this paper focuses on quantifying the DNN accuracy given that a transient error has occurred, which tells us how well a network behaves when a transient error occurs. We call this metric Resiliency Accuracy (RA). We show that existing RA formulation is fundamentally inaccurate, because it incorrectly assumes that software variables (model weights/activations) have equal faulty probability under hardware transient faults. We present an algorithm that captures the faulty probabilities of DNN variables under transient faults and, thus, provides correct RA estimations validated by hardware. To accelerate RA estimation, we reformulate RA calculation as a Monte Carlo integration problem, and solve it using importance sampling driven by DNN specific heuristics. Using our lightweight RA estimation method, we show that transient faults lead to far greater accuracy degradation than what todays DNN resiliency tools estimate. We show how our RA estimation tool can help design more resilient DNNs by integrating it with a Network Architecture Search framework.