Faster Math Functions, Soundly,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Faster Math Functions, Soundly
arXiv - CS - Mathematical Software Pub Date : 2021-07-12 , DOI: arxiv-2107.05761
Ian Briggs, Pavel Panchekha

Standard library implementations of functions like sin and exp optimize for accuracy, not speed, because they are intended for general-purpose use. But applications tolerate inaccuracy from cancellation, rounding error, and singularities-sometimes even very high error-and many application could tolerate error in function implementations as well. This raises an intriguing possibility: speeding up numerical code by tuning standard function implementations. This paper thus introduces OpTuner, an automatic method for selecting the best implementation of mathematical functions at each use site. OpTuner assembles dozens of implementations for the standard mathematical functions from across the speed-accuracy spectrum. OpTuner then uses error Taylor series and integer linear programming to compute optimal assignments of function implementation to use site and presents the user with a speed-accuracy Pareto curve they can use to speed up their code. In a case study on the POV-Ray ray tracer, OpTuner speeds up a critical computation, leading to a whole program speedup of 9% with no change in the program output (whereas human efforts result in slower code and lower-quality output). On a broader study of 37 standard benchmarks, OpTuner matches 216 implementations to 89 use sites and demonstrates speed-ups of 107% for negligible decreases in accuracy and of up to 438% for error-tolerant applications.

中文翻译：

更快的数学函数，可靠

sin 和 exp 等函数的标准库实现优化了准确性，而不是速度，因为它们旨在用于通用用途。但是应用程序可以容忍因取消、舍入错误和奇点（有时甚至是非常高的错误）造成的不准确，而且许多应用程序也可以容忍函数实现中的错误。这提出了一个有趣的可能性：通过调整标准函数实现来加速数字代码。因此，本文介绍了 OpTuner，这是一种在每个使用地点选择最佳数学函数实现的自动方法。OpTuner 为速度精度范围内的标准数学函数汇集了数十种实现。OpTuner 然后使用误差泰勒级数和整数线性规划来计算函数实现的最佳分配以使用站点，并向用户提供速度-精度帕累托曲线，他们可以使用它们来加速他们的代码。在 POV-Ray 光线追踪器的案例研究中，OpTuner 加速了关键计算，使整个程序加速了 9%，而程序输出没有变化（而人工操作会导致代码变慢和输出质量降低）。在对 37 个标准基准的更广泛研究中，OpTuner 将 216 个实现与 89 个使用站点相匹配，并展示了 107% 的速度提升（对于可忽略不计的准确性下降）和高达 438% 的容错应用程序。OpTuner 加速关键计算，导致整个程序加速 9%，而程序输出没有变化（而人工操作会导致代码变慢和输出质量降低）。在对 37 个标准基准的更广泛研究中，OpTuner 将 216 个实现与 89 个使用站点相匹配，并展示了 107% 的速度提升（对于可忽略不计的准确性下降）和高达 438% 的容错应用程序。OpTuner 加速关键计算，导致整个程序加速 9%，而程序输出没有变化（而人工操作会导致代码变慢和输出质量降低）。在对 37 个标准基准的更广泛研究中，OpTuner 将 216 个实现与 89 个使用站点相匹配，并展示了 107% 的速度提升（对于可忽略不计的准确性下降）和高达 438% 的容错应用程序。

更新日期：2021-07-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文