论文标题

指标足够了吗?传达和可视化预测模型的准则

Are Metrics Enough? Guidelines for Communicating and Visualizing Predictive Models to Subject Matter Experts

论文作者

Suh, Ashley, Appleby, Gabriel, Anderson, Erik W., Finelli, Luca, Chang, Remco, Cashman, Dylan

论文摘要

提出预测模型的性能是一种通信瓶颈,威胁到数据科学家与主题专家之间的合作。仅准确性和错误指标就无法讲述一个模型的整个故事 - 其风险,优势和局限性 - 使主题专家难以对使用模型的决定感到自信。结果,模型可能以意想不到的方式失败或完全没有使用,因为主题专家无视呈现不佳的模型,而不是熟悉但可以说是不合格的方法。在本文中,我们描述了一项与主题专家和数据科学家一起进行的迭代研究,以了解这两组之间的交流差距。我们发现,尽管两组共享了了解模型的数据和预测的共同目标,但摩擦可能源于陌生的术语,指标和可视化 - 限制了知识转移到中小型企业,并阻止了在演讲中提出的澄清问题。根据我们的发现,我们得出了一套通信指南,这些指南使用可视化作为传达模型的优势和劣势的常见媒介。我们在回归建模方案中提供了指南,并引起对主题专家使用的反馈。从我们的演示中,主题专家更愿意讨论模型的性能,更了解呈现的模型的权衡,并且能够更好地评估模型的风险 - 最终使模型的使用范围超出了文本和数字。

Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail in unexpected ways or go entirely unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication between these two groups. We find that, while the two groups share common goals of understanding the data and predictions of the model, friction can stem from unfamiliar terms, metrics, and visualizations - limiting the transfer of knowledge to SMEs and discouraging clarifying questions being asked during presentations. Based on our findings, we derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model. We provide a demonstration of our guidelines in a regression modeling scenario and elicit feedback on their use from subject matter experts. From our demonstration, subject matter experts were more comfortable discussing a model's performance, more aware of the trade-offs for the presented model, and better equipped to assess the model's risks - ultimately informing and contextualizing the model's use beyond text and numbers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源