论文标题
部分可观测时空混沌系统的无模型预测
SCOPE: Safe Exploration for Dynamic Computer Systems Optimization
论文作者
论文摘要
现代计算机系统需要在严格的安全限制下执行(例如,电源限制),但经常与其提供高性能的能力相冲突(即最小的延迟)。先前的工作使用机器学习来自动调整硬件资源,从而使系统执行最佳地符合安全限制。此类解决方案监视过去的系统执行,以在不同的硬件资源分配中学习系统的行为,然后动态调整资源以优化应用程序执行。但是,系统行为可以在不同的应用程序甚至相同应用程序的不同输入之间发生显着变化。因此,使用先验收集的数据学到的模型通常是次优的,并且与新的应用程序和输入一起使用时会违反安全限制。为了解决此限制,我们介绍了执行空间的概念,即硬件资源,输入功能和应用程序的跨产品。为了动态,安全地从执行空间分配硬件资源,我们提出了一个利用新颖的安全探索框架的资源管理器范围。我们评估了范围能够通过在运行各种Apache Spark应用程序的同时动态配置硬件来最大程度地限制功率约束违规的能力,同时最大程度地减少了违反功率约束。与以前的方法最大程度地减少功率约束的方法相比,范围消耗了可比的功率,同时将延迟提高到9.5倍。与以前的方法最小化延迟相比,范围可实现相似的延迟,但违反功率限制率最高为45.88倍,在所有应用程序中都达到了几乎为零的安全约束违规行为。
Modern computer systems need to execute under strict safety constraints (e.g., a power limit), but doing so often conflicts with their ability to deliver high performance (i.e. minimal latency). Prior work uses machine learning to automatically tune hardware resources such that the system execution meets safety constraints optimally. Such solutions monitor past system executions to learn the system's behavior under different hardware resource allocations before dynamically tuning resources to optimize the application execution. However, system behavior can change significantly between different applications and even different inputs of the same applications. Hence, the models learned using data collected a priori are often suboptimal and violate safety constraints when used with new applications and inputs. To address this limitation, we introduce the concept of an execution space, which is the cross product of hardware resources, input features, and applications. To dynamically and safely allocate hardware resources from the execution space, we present SCOPE, a resource manager that leverages a novel safe exploration framework. We evaluate SCOPE's ability to deliver improved latency while minimizing power constraint violations by dynamically configuring hardware while running a variety of Apache Spark applications. Compared to prior approaches that minimize power constraint violations, SCOPE consumes comparable power while improving latency by up to 9.5X. Compared to prior approaches that minimize latency, SCOPE achieves similar latency but reduces power constraint violation rates by up to 45.88X, achieving almost zero safety constraint violations across all applications.