当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parsing gigabytes of JSON per second
The VLDB Journal ( IF 2.8 ) Pub Date : 2019-10-11 , DOI: 10.1007/s00778-019-00578-5
Geoff Langdale , Daniel Lemire

JavaScript Object Notation or JSON is a ubiquitous data exchange format on the web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible. Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser to process gigabytes of data per second on a single core, using commodity processors. We can use a quarter or fewer instructions than a state-of-the-art reference parser like RapidJSON. Unlike other validating parsers, our software (simdjson) makes extensive use of single instruction and multiple data instructions. To ensure reproducibility, simdjson is freely available as open-source software under a liberal license.

中文翻译:

每秒解析千兆字节的JSON

JavaScript Object Notation或JSON是网络上无处不在的数据交换格式。由于庞大的数据量,提取JSON文档可能会成为性能瓶颈。因此,我们有动力使JSON解析尽可能快。尽管JSON解析问题已经很成熟,但我们显示出大幅提高速度是可能的。我们展示了第一个符合标准的JSON解析器,它使用商用处理器在单个内核上每秒处理千兆字节的数据。与最先进的引用解析器(如RapidJSON)相比,我们可以使用四分之一或更少的指令。与其他验证解析器不同,我们的软件(simdjson)广泛使用单个指令和多个数据指令。为了确保可重现性,simdjson是在自由许可下作为开源软件免费提供的。
更新日期:2019-10-11
down
wechat
bug