归因语言模型的人工审问

论文标题

归因语言模型的人工审问

Artificial Interrogation for Attributing Language Models

论文作者

Dhanani, Farhan, Rafi, Muhammad

论文摘要

本文介绍了由Miter，Microsoft，Schmidt-Futures，健壮的智能，Lincoln-Network和HuggingFace社区共同组织的机器学习模型归因挑战（MLMAC）。挑战为著名组织开发的流行语言模型的十二个开源基本版本和十二个以文本生成的微调语言模型。固定模型的名称和架构详细信息被隐藏了，参与者只能通过组织者开发的REST API访问这些模型。鉴于这些限制，比赛的目标是确定哪种微调模型起源于哪种基本模型。为了解决这一挑战，我们假设微调模型及其相应的基本版本必须具有类似的词汇仪，并具有匹配的句法写作样式，它们在其生成的输出中产生了共鸣。我们的策略是开发一组查询以询问基础和微调模型。然后根据其生成的响应中的相似性进行一对多配对，其中一个以上的微型模型可以与基本模型配对，而不是反之亦然。我们采用了四种不同的方法来衡量两组模型产生的响应之间的相似度。第一种方法使用机器翻译的评估指标，第二种方法使用了矢量空间模型。第三种方法使用最先进的多级文本分类，变压器模型。最后，第四种方法使用一组基于变压器的二进制文本分类器，一个针对每个提供的基本模型，以单VS-ALL方式执行多级文本分类。本文报告了这些方法的实施细节，比较和实验研究，以及最终获得的结果。

This paper presents solutions to the Machine Learning Model Attribution challenge (MLMAC) collectively organized by MITRE, Microsoft, Schmidt-Futures, Robust-Intelligence, Lincoln-Network, and Huggingface community. The challenge provides twelve open-sourced base versions of popular language models developed by well-known organizations and twelve fine-tuned language models for text generation. The names and architecture details of fine-tuned models were kept hidden, and participants can access these models only through the rest APIs developed by the organizers. Given these constraints, the goal of the contest is to identify which fine-tuned models originated from which base model. To solve this challenge, we have assumed that fine-tuned models and their corresponding base versions must share a similar vocabulary set with a matching syntactical writing style that resonates in their generated outputs. Our strategy is to develop a set of queries to interrogate base and fine-tuned models. And then perform one-to-many pairing between them based on similarities in their generated responses, where more than one fine-tuned model can pair with a base model but not vice-versa. We have employed four distinct approaches for measuring the resemblance between the responses generated from the models of both sets. The first approach uses evaluation metrics of the machine translation, and the second uses a vector space model. The third approach uses state-of-the-art multi-class text classification, Transformer models. Lastly, the fourth approach uses a set of Transformer based binary text classifiers, one for each provided base model, to perform multi-class text classification in a one-vs-all fashion. This paper reports implementation details, comparison, and experimental studies, of these approaches along with the final obtained results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题