Delayed approximate matrix assembly in multigrid with dynamic precisions,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Delayed approximate matrix assembly in multigrid with dynamic precisions
arXiv - CS - Mathematical Software Pub Date : 2020-05-07 , DOI: arxiv-2005.03606
Charles D. Murray and Tobias Weinzierl

The accurate assembly of the system matrix is an important step in any code that solves partial differential equations on a mesh. We either explicitly set up a matrix, or we work in a matrix-free environment where we have to be able to quickly return matrix entries upon demand. Either way, the construction can become costly due to non-trivial material parameters entering the equations, multigrid codes requiring cascades of matrices that depend upon each other, or dynamic adaptive mesh refinement that necessitates the recomputation of matrix entries or the whole equation system throughout the solve. We propose that these constructions can be performed concurrently with the multigrid cycles. Initial geometric matrices and low accuracy integrations kickstart the multigrid, while improved assembly data is fed to the solver as and when it becomes available. The time to solution is improved as we eliminate an expensive preparation phase traditionally delaying the actual computation. We eliminate algorithmic latency. Furthermore, we desynchronise the assembly from the solution process. This anarchic increase of the concurrency level improves the scalability. Assembly routines are notoriously memory- and bandwidth-demanding. As we work with iteratively improving operator accuracies, we finally propose the use of a hierarchical, lossy compression scheme such that the memory footprint is brought down aggressively where the system matrix entries carry little information or are not yet available with high accuracy.

中文翻译：

具有动态精度的多重网格中的延迟近似矩阵组装

系统矩阵的精确组装是求解网格上偏微分方程的任何代码中的重要步骤。我们要么明确设置矩阵，要么在无矩阵环境中工作，在这种环境中我们必须能够根据需要快速返回矩阵条目。无论哪种方式，由于进入方程的非平凡材料参数、需要相互依赖的矩阵级联的多重网格代码，或者需要重新计算矩阵条目或整个方程系统的动态自适应网格细化，构造都可能变得昂贵。解决。我们建议这些构造可以与多重网格周期同时进行。初始几何矩阵和低精度积分启动多重网格，而改进的装配数据将在可用时提供给求解器。由于我们消除了传统上会延迟实际计算的昂贵准备阶段，从而缩短了求解时间。我们消除了算法延迟。此外，我们从解决方案过程中取消了程序集的同步。并发级别的这种无序增加提高了可扩展性。众所周知，汇编例程需要内存和带宽。当我们反复提高算子精度时，我们最终建议使用分层的有损压缩方案，以便在系统矩阵条目携带很少信息或尚未高精度可用的情况下积极降低内存占用。由于我们消除了传统上会延迟实际计算的昂贵准备阶段，从而缩短了求解时间。我们消除了算法延迟。此外，我们从解决方案过程中取消了程序集的同步。并发级别的这种无序增加提高了可扩展性。众所周知，汇编例程需要内存和带宽。当我们反复提高算子精度时，我们最终建议使用分层的有损压缩方案，以便在系统矩阵条目携带很少信息或尚未高精度可用的情况下积极降低内存占用。由于我们消除了传统上会延迟实际计算的昂贵准备阶段，从而缩短了求解时间。我们消除了算法延迟。此外，我们从解决方案过程中取消了程序集的同步。并发级别的这种无序增加提高了可扩展性。众所周知，汇编例程需要内存和带宽。当我们反复提高算子精度时，我们最终建议使用分层的有损压缩方案，以便在系统矩阵条目携带很少信息或尚未高精度可用的情况下积极降低内存占用。并发级别的这种无序增加提高了可扩展性。众所周知，汇编例程需要内存和带宽。当我们反复提高算子精度时，我们最终建议使用分层的有损压缩方案，以便在系统矩阵条目携带很少信息或尚未高精度可用的情况下积极降低内存占用。并发级别的这种无序增加提高了可扩展性。众所周知，汇编例程需要内存和带宽。当我们反复提高算子精度时，我们最终建议使用分层的有损压缩方案，以便在系统矩阵条目携带很少信息或尚未高精度可用的情况下积极降低内存占用。

更新日期：2020-06-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>