当前位置: X-MOL 学术Theor. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the performance of learned data structures
Theoretical Computer Science ( IF 1.1 ) Pub Date : 2021-04-28 , DOI: 10.1016/j.tcs.2021.04.015
Paolo Ferragina , Fabrizio Lillo , Giorgio Vinciguerra

A recent trend in algorithm design consists of augmenting classic data structures with machine learning models, which are better suited to reveal and exploit patterns and trends in the input data so to achieve outstanding practical improvements in space occupancy and time efficiency. This is especially known in the context of indexing data structures for big data where, despite few attempts in evaluating their asymptotic efficiency, theoretical results are yet missing in showing that learned indexes are provably better than classic indexes, such as B-tree s and their variants. In this paper, we present the first mathematically-grounded answer to this problem by exploiting a link with a mean exit time problem over a proper stochastic process which, we show, is related to the space and time complexity of these learned indexes. As a corollary of this general analysis, we show that plugging this result in the (learned) PGM-index, we get a learned data structure which is provably better than B-tree s.



中文翻译:

关于学习的数据结构的性能

算法设计的最新趋势包括使用机器学习模型扩充经典数据结构,该模型更适合于揭示和利用输入数据的模式和趋势,从而在空间占用和时间效率方面实现显着的实际改进。在大数据的索引数据结构的上下文中,这尤其众所周知,尽管很少尝试评估其渐近效率,但仍缺乏理论结果来表明学习的索引可证明是更好的方法比经典索引(例如B-tree及其变体)要多。在本文中,我们通过在适当的随机过程中利用具有平均退出时间问题的链接,提出了该问题的第一个数学基础答案,我们证明,该过程与这些学习指标的时空复杂度有关。作为此一般分析的推论,我们证明了将此结果插入(学习的)PGM索引中,我们得到了一种学习的数据结构,该数据结构被证明比B树更好。

更新日期:2021-05-18
down
wechat
bug