当前位置: X-MOL 学术SIAM J. Sci. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Implementing High-Performance Complex Matrix Multiplication via the 1M Method
SIAM Journal on Scientific Computing ( IF 3.0 ) Pub Date : 2020-09-15 , DOI: 10.1137/19m1282040
Field G. Van Zee

SIAM Journal on Scientific Computing, Volume 42, Issue 5, Page C221-C244, January 2020.
Almost all efforts to optimize high-performance matrix-matrix multiplication have been focused on the case where matrices contain real elements. The community's collective assumption appears to have been that the techniques and methods developed for the real domain carry over directly to the complex domain. As a result, implementors have mostly overlooked a class of methods that compute complex matrix multiplication using only real matrix products. This is the second in a series of articles that investigate these so-called induced methods. In the previous article, we found that algorithms based on the more generally applicable of the two methods---the 4m method---lead to implementations that, for various reasons, often underperform their real domain counterparts. To overcome these limitations, we derive a superior 1m method for expressing complex matrix multiplication, one which addresses virtually all of the shortcomings inherent in 4m. Implementations are developed within the BLIS framework, and testing on microarchitectures by three vendors confirms that the 1m method yields performance that is generally competitive with solutions based on conventionally implemented complex kernels, sometimes even outperforming vendor libraries.


中文翻译:

通过1M方法实现高性能复数矩阵乘法

SIAM科学计算杂志,第42卷,第5期,第C221-C244页,2020年1月。
优化高性能矩阵矩阵乘法的几乎所有工作都集中在矩阵包含实元素的情况下。社区的集体假设似乎是,为实际领域开发的技术和方法直接转移到复杂领域。结果,实现者几乎忽略了一类仅使用实矩阵乘积来计算复杂矩阵乘法的方法。这是研究这些所谓的诱导方法的系列文章中的第二篇。在上一篇文章中,我们发现基于两种方法(4m方法)中更普遍适用的算法导致的实现由于种种原因而常常不如它们的实际领域对应。为了克服这些限制,我们推导了一种用于表示复杂矩阵乘法的高级1m方法,该方法实际上解决了4m中固有的所有缺点。实现是在BLIS框架内开发的,并且由三家供应商进行的微体系结构测试证实,1m方法所产生的性能通常与基于传统实现的复杂内核的解决方案相比具有竞争力,有时甚至优于供应商库。
更新日期:2020-10-16
down
wechat
bug