当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes
Big Data Research ( IF 3.5 ) Pub Date : 2021-01-26 , DOI: 10.1016/j.bdr.2021.100205
Ciprian-Octavian Truică , Elena-Simona Apostol , Jérôme Darmont , Torben Bach Pedersen

In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.



中文翻译:

面向文档的被遗忘的数据库管理系统:与JSON DODBMS相比,本机XML DODBMS的概述和基准

在大数据的当前背景下,已经提出并实现了许多用于存储,管理和从半结构化数据中提取信息和模式的NoSQL解决方案。通过引入半结构化和灵活的模式设计,开发了这些解决方案来缓解关系数据库中存在的刚性数据结构的问题。由于由不同来源和设备(尤其是从IoT传感器和执行器生成的当前数据)使用XML或JSON格式,具体取决于应用程序,因此需要以XML格式存储和查询半结构化数据的数据库技术。因此,最初设计为使用标准化查询语言(即XQuery和XPath)来处理XML数据的本机XML数据库被更名为NoSQL面向文档的数据库系统。目前,这些解决方案中的大多数已被更现代的基于JSON的数据库管理系统所取代。但是,我们认为基于XML的解决方案仍可以在对异构集合执行复杂查询时提供性能。不幸的是,如今,研究还没有清楚地比较以XML与更现代的JSON格式存储和查询文档的数据库技术的可伸缩性和性能。而且,据我们所知,没有针对此类数据库技术的符合大数据的基准。在本文中,我们对使用XML格式对文档进行编码(即BaseX,eXist-db和Sedna)或JSON格式(即MongoDB,CouchDB和Couchbase)的面向文档的数据库系统进行了比较。

更新日期:2021-02-02
down
wechat
bug