论文标题
对行为不良的学习曲线的调查:或更多数据不需要导致更好的性能
A Survey of Learning Curves with Bad Behavior: or How More Data Need Not Lead to Better Performance
论文作者
论文摘要
针对训练组的大小绘制学习者的概括表现会导致所谓的学习曲线。该工具提供了对学习者行为的见识,实际上对于模型选择,预测更多培训数据的效果并降低了培训的计算复杂性。我们着手将(理想的)学习曲线概念精确,并简要讨论上述此类曲线的用法。但是,该调查重点的很大一部分是学习曲线,这些曲线表明更多数据并不一定会导致更好的概括性能。对于许多人工智能领域的许多研究人员来说,这似乎令人惊讶。我们指出了这些发现的重要性,并通过对该领域的开放问题进行了概述和讨论,以进一步的理论和经验调查进行了概述和讨论。
Plotting a learner's generalization performance against the training set size results in a so-called learning curve. This tool, providing insight in the behavior of the learner, is also practically valuable for model selection, predicting the effect of more training data, and reducing the computational complexity of training. We set out to make the (ideal) learning curve concept precise and briefly discuss the aforementioned usages of such curves. The larger part of this survey's focus, however, is on learning curves that show that more data does not necessarily leads to better generalization performance. A result that seems surprising to many researchers in the field of artificial intelligence. We point out the significance of these findings and conclude our survey with an overview and discussion of open problems in this area that warrant further theoretical and empirical investigation.