Number parsing at a gigabyte per second,Software: Practice and Experience

当前位置： X-MOL 学术 › Softw. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Number parsing at a gigabyte per second
Software: Practice and Experience ( IF 2.6 ) Pub Date : 2021-05-11 , DOI: 10.1002/spe.2984
Daniel Lemire ₁

Affiliation

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-precision arithmetic. However, we need at most 17 digits to represent 64-bit standard floating-point numbers (IEEE 754). Thus, we can represent the decimal significand with a single 64-bit word. By combining the significand and precomputed tables, we can compute the nearest floating-point number using as few as one or two 64-bit multiplications. Our implementation can be several times faster than conventional functions present in standard C libraries on modern 64-bit systems (Intel, AMD, ARM, and POWER9). Our work is available as open source software used by major systems such as Apache Arrow and Yandex ClickHouse. The Go standard library has adopted a version of our approach.

中文翻译：

每秒千兆字节的数字解析

随着磁盘和网络每秒提供千兆字节，从字符串解析十进制数成为一个瓶颈。我们考虑将十进制数解析为最接近的二进制浮点值的问题。一般问题需要可变精度算术。但是，我们最多需要 17 位数字来表示 64 位标准浮点数 (IEEE 754)。因此，我们可以用单个 64 位字表示十进制有效数。通过组合有效数和预计算表，我们可以使用少至一到两次 64 位乘法来计算最接近的浮点数。我们的实现可以比现代 64 位系统（Intel、AMD、ARM 和 POWER9）上标准 C 库中的传统函数快几倍。我们的工作可作为主要系统使用的开源软件提供，例如 Apache Arrow 和 Yandex ClickHouse。Go 标准库采用了我们方法的一个版本。

更新日期：2021-07-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文