论文标题
高斯工艺模型的基于交叉验证的自适应采样
Cross-validation based adaptive sampling for Gaussian process models
论文作者
论文摘要
在许多现实世界中,我们对近似黑框感兴趣,并且在功能评估数量最少的情况下,尽可能准确地函数。复杂的计算机代码是此类函数的一个示例。在这项工作中,使用高斯过程(GP)模拟器来近似复杂的计算机代码的输出。我们考虑了依次扩展初始实验(模型运行集)以改善模拟器的问题。提出了基于剩余(LOO)交叉验证的顺序采样方法,可以轻松地扩展到批处理模式。这是一个理想的属性,因为它节省了可行计算的用户时间。将GP拟合到训练数据点后,在每个设计点都计算了预期的平方loo(ES-loo)误差。 ES-loo被用作识别重要数据点的度量。更确切地说,当此数量很大时,这意味着预测的质量在很大程度上取决于这一点,并在附近添加更多样本可以提高GP的准确性。结果,选择最大化ES-loo的下一个样本是合理的。但是,ES-loo仅在实验设计上才知道,需要在未观察到的点上进行估计。为此,将第二个GP拟合到ES-LOO误差,并且选择了修改后的预期改进(EI)标准的最大值作为下一个样本。 EI是贝叶斯优化中流行的采集功能,用于在本地/全球搜索之间进行权衡。但是,它具有剥削的趋势,这意味着它的最大值接近(当前)“最佳”样本。为了避免聚类,采用了一种修改后的EI(称为伪预期改进)的EI,它比EI更具探索性,但允许我们发现未开发的区域。我们的结果表明,提出的抽样方法有希望。
In many real-world applications, we are interested in approximating black-box, costly functions as accurately as possible with the smallest number of function evaluations. A complex computer code is an example of such a function. In this work, a Gaussian process (GP) emulator is used to approximate the output of complex computer code. We consider the problem of extending an initial experiment (set of model runs) sequentially to improve the emulator. A sequential sampling approach based on leave-one-out (LOO) cross-validation is proposed that can be easily extended to a batch mode. This is a desirable property since it saves the user time when parallel computing is available. After fitting a GP to training data points, the expected squared LOO (ES-LOO) error is calculated at each design point. ES-LOO is used as a measure to identify important data points. More precisely, when this quantity is large at a point it means that the quality of prediction depends a great deal on that point and adding more samples nearby could improve the accuracy of the GP. As a result, it is reasonable to select the next sample where ES-LOO is maximised. However, ES-LOO is only known at the experimental design and needs to be estimated at unobserved points. To do this, a second GP is fitted to the ES-LOO errors and where the maximum of the modified expected improvement (EI) criterion occurs is chosen as the next sample. EI is a popular acquisition function in Bayesian optimisation and is used to trade-off between local/global search. However, it has a tendency towards exploitation, meaning that its maximum is close to the (current) "best" sample. To avoid clustering, a modified version of EI, called pseudo expected improvement, is employed which is more explorative than EI yet allows us to discover unexplored regions. Our results show that the proposed sampling method is promising.