论文标题
现代化HPC系统软件堆栈
Modernizing the HPC System Software Stack
论文作者
论文摘要
在1990年代,HPC集中在国家实验室,大学和其他大型站点设计的分布式系统体系结构和软件堆栈中,可实现极高的计算。到2010年代,这些中心被网络尺度和云计算体系结构的规模所黯然失色,如今,即使即将到来的Exascale HPC系统也比大型网络公司使用的数据中心更小。同时,HPC社区允许系统软件设计停滞不前,依赖于尝试和真实设计的增量更改以在几代系统之间移动。我们认为,专注于可管理性,可扩展性,安全性和现代方法的现代系统软件堆栈将使整个HPC社区受益。在本文中,我们分解了典型的HPC系统软件堆栈的逻辑部分,研究更现代的方法来满足他们的需求,并提出未来工作的建议,这些工作将有助于社区朝这个方向发展。
Through the 1990s, HPC centers at national laboratories, universities, and other large sites designed distributed system architectures and software stacks that enabled extreme-scale computing. By the 2010s, these centers were eclipsed by the scale of web-scale and cloud computing architectures, and today even upcoming exascale HPC systems are magnitudes of scale smaller than those of datacenters employed by large web companies. Meanwhile, the HPC community has allowed system software designs to stagnate, relying on incremental changes to tried-and-true designs to move between generations of systems. We contend that a modern system software stack that focuses on manageability, scalability, security, and modern methods will benefit the entire HPC community. In this paper, we break down the logical parts of a typical HPC system software stack, look at more modern ways to meet their needs, and make recommendations of future work that would help the community move in that direction.