论文标题

编码的稀疏矩阵计算方案,以利用部分散布器

Coded sparse matrix computation schemes that leverage partial stragglers

论文作者

Das, Anindya Bijoy, Ramamoorthy, Aditya

论文摘要

大簇上的分布式矩阵计算可能会遭受缓慢或失败的工人节点(称为stragglers)的问题,该问题可以主导整体工作执行时间。编码计算利用从擦除编码的概念来通过运行包含作业的任务的“编码”副本来减轻散乱者的效果;流浪者通常被视为擦除。尽管这很有用,但应用程序存在一些问题,例如MDS代码以一种直接的方式。几种实用的矩阵计算方案涉及稀疏矩阵。 MDS代码通常需要密集的原始矩阵的一材料组合,以破坏其固有的稀疏性。这是有问题的,因为它导致工人计算时间明显更高。此外,将慢节点视为删除,忽略了它们执行的潜在有用的部分计算。此外,某些MDS技术还遭受了重大的数值稳定性问题。在这项工作中,我们提出的计划使我们能够利用散乱者的部分计算,同时对生成编码子膜的编码级别施加约束。与以前的方法相比,这大大减少了工人的计算时间,并在解码过程中改善了数值稳定性。亚马逊Web服务(AWS)集群上的详尽数值实验支持我们的发现。

Distributed matrix computations over large clusters can suffer from the problem of slow or failed worker nodes (called stragglers) which can dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of stragglers by running 'coded' copies of tasks comprising a job; stragglers are typically treated as erasures. While this is useful, there are issues with applying, e.g., MDS codes in a straightforward manner. Several practical matrix computation scenarios involve sparse matrices. MDS codes typically require dense linear combinations of submatrices of the original matrices which destroy their inherent sparsity. This is problematic as it results in significantly higher worker computation times. Moreover, treating slow nodes as erasures ignores the potentially useful partial computations performed by them. Furthermore, some MDS techniques also suffer from significant numerical stability issues. In this work we present schemes that allow us to leverage partial computation by stragglers while imposing constraints on the level of coding that is required in generating the encoded submatrices. This significantly reduces the worker computation time as compared to previous approaches and results in improved numerical stability in the decoding process. Exhaustive numerical experiments on Amazon Web Services (AWS) clusters support our findings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源