论文标题
在具有不确定地质的地下系统中最佳井控制的深度加固学习
Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology
论文作者
论文摘要
引入了基于深入增强学习(DRL)的一般控制策略框架,以在地下流设置中进行闭环决策。在这种情况下,传统的闭环建模工作流涉及重复应用数据同化/历史记录匹配和强大的优化步骤。在不确定地质风格(场景)和单个模型实现的情况下,数据同化尤其具有挑战性。闭环储层管理(CLRM)问题在此提出为部分可观察到的马尔可夫决策过程,并使用近端策略优化算法解决了相关的优化问题。这提供了一个控制策略,该策略即时映射在井处观察到的流量数据(实际上可以使用)以最佳的井压力设置。该政策由时间卷积和封闭式变压器块表示。培训是在预处理步骤中进行的,并具有先前的地质模型的集合,可以从多种地质场景中得出。提出了涉及通过注水生产石油的示例案例,并提出了2D和3D地质模型的示例。基于DRL的方法证明,相对于先前模型的强大优化,相对于传统CLRM,相对于先前模型的强大优化,NPV增加了15%(对于2D病例),而33%(3D病例)相对于强大的优化而增加了4%。发现控制策略的解决方案与确定性优化的解决方案相媲美,即使考虑了多种地质场景,假定地质模型是已知的。相对于传统CLRM,控制策略方法的计算成本下降了76%,并使用本工作中考虑的算法和参数设置下降。
A general control policy framework based on deep reinforcement learning (DRL) is introduced for closed-loop decision making in subsurface flow settings. Traditional closed-loop modeling workflows in this context involve the repeated application of data assimilation/history matching and robust optimization steps. Data assimilation can be particularly challenging in cases where both the geological style (scenario) and individual model realizations are uncertain. The closed-loop reservoir management (CLRM) problem is formulated here as a partially observable Markov decision process, with the associated optimization problem solved using a proximal policy optimization algorithm. This provides a control policy that instantaneously maps flow data observed at wells (as are available in practice) to optimal well pressure settings. The policy is represented by a temporal convolution and gated transformer blocks. Training is performed in a preprocessing step with an ensemble of prior geological models, which can be drawn from multiple geological scenarios. Example cases involving the production of oil via water injection, with both 2D and 3D geological models, are presented. The DRL-based methodology is shown to result in an NPV increase of 15% (for the 2D cases) and 33% (3D cases) relative to robust optimization over prior models, and to an average improvement of 4% in NPV relative to traditional CLRM. The solutions from the control policy are found to be comparable to those from deterministic optimization, in which the geological model is assumed to be known, even when multiple geological scenarios are considered. The control policy approach results in a 76% decrease in computational cost relative to traditional CLRM with the algorithms and parameter settings considered in this work.