人工智能的热力学启发的解释

论文标题

人工智能的热力学启发的解释

Thermodynamics-inspired Explanations of Artificial Intelligence

论文作者

Mehdi, Shams, Tiwary, Pratyush

论文摘要

近年来，预测性机器学习方法在各种科学领域都变得突出。但是，由于其黑框性质，必须在接受这些模型之前对这些模型建立信任至关重要。分配信任的一种有前途的策略是采用解释技术，以人类可以理解的方式阐明黑盒模型的预测背后的理由。但是，评估这种方法产生的理由的人类解释性程度是一个非平凡的挑战。在这项工作中，我们将解释熵作为评估与任何线性模型相关的人类解释性程度的通用解决方案。使用这个概念并从经典热力学中汲取灵感，我们提出了热力学启发的AI和其他黑盒范式（TERP）的可解释表示，这是一种在模型 - 敏捷方式中生成准确且可对黑盒的预测进行精确和人解释的方法。为了证明TERP的广泛适用性，我们成功地采用了它来解释各种黑框模型体系结构，包括深度学习自动编码器，复发性神经网络和卷积神经网络，跨分子模拟，文本和图像分类等不同领域。

In recent years, predictive machine learning methods have gained prominence in various scientific domains. However, due to their black-box nature, it is essential to establish trust in these models before accepting them as accurate. One promising strategy for assigning trust involves employing explanation techniques that elucidate the rationale behind a black-box model's predictions in a manner that humans can understand. However, assessing the degree of human interpretability of the rationale generated by such methods is a nontrivial challenge. In this work, we introduce interpretation entropy as a universal solution for assessing the degree of human interpretability associated with any linear model. Using this concept and drawing inspiration from classical thermodynamics, we present Thermodynamics-inspired Explainable Representations of AI and other black-box Paradigms (TERP), a method for generating accurate, and human-interpretable explanations for black-box predictions in a model-agnostic manner. To demonstrate the wide-ranging applicability of TERP, we successfully employ it to explain various black-box model architectures, including deep learning Autoencoders, Recurrent Neural Networks, and Convolutional Neural Networks, across diverse domains such as molecular simulations, text, and image classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题