关于评估可解释的ML方法的应用实验设计的重要性

论文标题

关于评估可解释的ML方法的应用实验设计的重要性

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

论文作者

Amarasinghe, Kasun, Rodolfa, Kit T., Jesus, Sérgio, Chen, Valerie, Balayan, Vladimir, Saleiro, Pedro, Bizarro, Pedro, Talwalkar, Ameet, Ghani, Rayid

论文摘要

可解释的机器学习（ML）方法的大多数现有评估都取决于简化不反映现实世界用例的假设或代理。对现实世界中的少数更强大的评估在其设计中存在缺点，从而得出了方法的现实世界实用程序的有限结论。在这项工作中，我们试图通过进行一项研究来弥合这一差距，该研究在与预期部署环境一致的环境中评估三种流行的可解释的ML方法。我们基于先前关于电子商务欺诈检测的研究，并为其设置做出了至关重要的修改，从而放松了从部署环境中偏离的原始工作中简化的假设。在此过程中，我们与早期工作得出了截然不同的结论，并且找不到有关任务中测试方法逐步实用性的证据。我们的结果表明，看似微不足道的实验设计选择如何产生误导性的结论，并需要使用任务，数据，用户和指标来评估可解释的ML方法的经验教训，这些方法基于预期的部署环境，还开发了针对特定应用程序量身定制的方法。此外，我们认为该实验的设计可以作为未来研究设计的模板，以评估其他现实世界中可解释的ML方法。

Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility. In this work, we seek to bridge this gap by conducting a study that evaluates three popular explainable ML methods in a setting consistent with the intended deployment context. We build on a previous study on e-commerce fraud detection and make crucial modifications to its setup relaxing the simplifying assumptions made in the original work that departed from the deployment context. In doing so, we draw drastically different conclusions from the earlier work and find no evidence for the incremental utility of the tested methods in the task. Our results highlight how seemingly trivial experimental design choices can yield misleading conclusions, with lessons about the necessity of not only evaluating explainable ML methods using tasks, data, users, and metrics grounded in the intended deployment contexts but also developing methods tailored to specific applications. In addition, we believe the design of this experiment can serve as a template for future study designs evaluating explainable ML methods in other real-world contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题