当前位置: X-MOL 学术ACM Trans. Math. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The BLAS API of BLASFEO
ACM Transactions on Mathematical Software ( IF 2.7 ) Pub Date : 2020-05-22 , DOI: 10.1145/3378671
Gianluca Frison 1 , Tommaso Sartor 1 , Andrea Zanelli 1 , Moritz Diehl 1
Affiliation  

Basic Linear Algebra Subroutines For Embedded Optimization (BLASFEO) is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization and other applications targeting relatively small matrices. BLASFEO defines an application programming interface (API) which uses a packed matrix format as its native format. This format is analogous to the internal memory buffers of optimized BLAS, but it is exposed to the user and it removes the packing cost from the routine call. For matrices fitting in cache, BLASFEO outperforms optimized BLAS implementations, both open source and proprietary. This article investigates the addition of a standard BLAS API to the BLASFEO framework, and proposes an implementation switching between two or more algorithms optimized for different matrix sizes. Thanks to the modular assembly framework in BLASFEO, tailored linear algebra kernels with mixed column- and panel-major arguments are easily developed. This BLAS API has lower performance than the BLASFEO API, but it nonetheless outperforms optimized BLAS and especially LAPACK libraries for matrices fitting in cache. Therefore, it can boost a wide range of applications, where standard BLAS and LAPACK libraries are employed and the matrix size is moderate. In particular, this article investigates the benefits in scientific programming languages such as Octave, SciPy, and Julia.

中文翻译:

BLASFEO 的 BLAS API

Basic Linear Algebra Subroutines For Embedded Optimization (BLASFEO) 是一个密集的线性代数库,提供了 BLAS 和 LAPACK 类例程的高性能实现,用于嵌入式优化和其他针对相对较小矩阵的应用程序。BLASFEO 定义了一个应用程序编程接口 (API),它使用压缩矩阵格式作为其原生格式。这种格式类似于优化的 BLAS 的内部内存缓冲区,但它向用户公开,并且从例程调用中消除了打包成本。对于适合缓存的矩阵,BLASFEO 优于优化的 BLAS 实现,无论是开源的还是专有的。本文研究向 BLASFEO 框架添加标准 BLAS API,并提出了一种在针对不同矩阵大小优化的两种或多种算法之间切换的实现。由于 BLASFEO 中的模块化组装框架,可以轻松开发具有混合列和面板主要参数的定制线性代数内核。此 BLAS API 的性能低于 BLASFEO API,但它仍然优于优化的 BLAS,尤其是用于适合缓存的矩阵的 LAPACK 库。因此,它可以促进广泛的应用,其中使用标准 BLAS 和 LAPACK 库并且矩阵大小适中。特别是,本文研究了 Octave、SciPy 和 Julia 等科学编程语言的优势。此 BLAS API 的性能低于 BLASFEO API,但它仍然优于优化的 BLAS,尤其是用于适合缓存的矩阵的 LAPACK 库。因此,它可以促进广泛的应用,其中使用标准 BLAS 和 LAPACK 库并且矩阵大小适中。特别是,本文研究了 Octave、SciPy 和 Julia 等科学编程语言的优势。此 BLAS API 的性能低于 BLASFEO API,但它仍然优于优化的 BLAS,尤其是用于适合缓存的矩阵的 LAPACK 库。因此,它可以促进广泛的应用,其中使用标准 BLAS 和 LAPACK 库并且矩阵大小适中。特别是,本文研究了 Octave、SciPy 和 Julia 等科学编程语言的优势。
更新日期:2020-05-22
down
wechat
bug