Parsing Gigabytes of JSON per Second,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Parsing Gigabytes of JSON per Second
arXiv - CS - Performance Pub Date : 2019-02-22 , DOI: arxiv-1902.08318
Geoff Langdale, Daniel Lemire

JavaScript Object Notation or JSON is a ubiquitous data exchange format on the Web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible. Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser to process gigabytes of data per second on a single core, using commodity processors. We can use a quarter or fewer instructions than a state-of-the-art reference parser like RapidJSON. Unlike other validating parsers, our software (simdjson) makes extensive use of Single Instruction, Multiple Data (SIMD) instructions. To ensure reproducibility, simdjson is freely available as open-source software under a liberal license.

中文翻译：

每秒解析千兆字节的 JSON

JavaScript Object Notation 或 JSON 是 Web 上普遍存在的数据交换格式。由于数据量庞大，摄取 JSON 文档可能成为性能瓶颈。因此，我们有动力尽可能快地进行 JSON 解析。尽管 JSON 解析问题已经很成熟，但我们证明了显着的加速是可能的。我们展示了第一个符合标准的 JSON 解析器，使用商用处理器在单核上每秒处理千兆字节的数据。与最先进的参考解析器（如 RapidJSON）相比，我们可以使用四分之一或更少的指令。与其他验证解析器不同，我们的软件 (simdjson) 广泛使用单指令多数据 (SIMD) 指令。为确保可重复性，simdjson 在自由许可下作为开源软件免费提供。

更新日期：2020-01-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>