LM-Debugger：用于检查和干预基于变压器的语言模型的交互式工具

论文标题

LM-Debugger：用于检查和干预基于变压器的语言模型的交互式工具

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

论文作者

Geva, Mor, Caciularu, Avi, Dar, Guy, Roit, Paul, Sadde, Shoval, Shlain, Micah, Tamir, Bar, Goldberg, Yoav

论文摘要

基于变压器的语言模型（LMS）的不透明性质和无法解释的行为引起了人们对解释其预测的广泛兴趣。但是，当前的解释方法主要集中于外部探测模型，执行行为测试以及分析显着性输入特征，而内部预测构建过程在很大程度上尚不清楚。在这项工作中，我们介绍了LM-Debugger，这是一种基于变压器的LMS的交互式调试工具，可对模型的内部预测过程进行精细的解释，以及用于介入LM行为的有力框架。对于骨干，LM-Debugger依靠一种最近的方法来解释内部令牌表示及其更新，并通过词汇空间中的馈送层来解释其更新。我们通过检查GPT2完成的内部歧义过程来证明LM-Debugger对单个预测调试的实用性。此外，我们通过识别网络中的一些向量并诱导预测过程的有效干预措施来显示LM-Debugger如何轻松地允许沿用户选择的方向移动模型行为。我们将LM-Debugger作为开源工具和GPT2型号的演示发布。

The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behavioral tests, and analyzing salience input features, while the internal prediction construction process is largely not understood. In this work, we introduce LM-Debugger, an interactive debugger tool for transformer-based LMs, which provides a fine-grained interpretation of the model's internal prediction process, as well as a powerful framework for intervening in LM behavior. For its backbone, LM-Debugger relies on a recent method that interprets the inner token representations and their updates by the feed-forward layers in the vocabulary space. We demonstrate the utility of LM-Debugger for single-prediction debugging, by inspecting the internal disambiguation process done by GPT2. Moreover, we show how easily LM-Debugger allows to shift model behavior in a direction of the user's choice, by identifying a few vectors in the network and inducing effective interventions to the prediction process. We release LM-Debugger as an open-source tool and a demo over GPT2 models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题