论文标题
重新思考流媒体机学习评估
Rethinking Streaming Machine Learning Evaluation
论文作者
论文摘要
虽然大多数评估机器学习(ML)模型的工作侧重于计算数据批次的准确性,但单独跟踪流媒体设置(即,无限制的,时间戳订购的数据集)的准确性未能适当地识别模型何时表现出乎意料。在该职位论文中,我们讨论了流媒体问题的性质如何引入新的现实世界挑战(例如,标签延迟到达),并建议其他指标来评估流媒体ML性能。
While most work on evaluating machine learning (ML) models focuses on computing accuracy on batches of data, tracking accuracy alone in a streaming setting (i.e., unbounded, timestamp-ordered datasets) fails to appropriately identify when models are performing unexpectedly. In this position paper, we discuss how the nature of streaming ML problems introduces new real-world challenges (e.g., delayed arrival of labels) and recommend additional metrics to assess streaming ML performance.