A data grouping model based on cache transaction for unstructured data storage systems,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A data grouping model based on cache transaction for unstructured data storage systems
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2021-11-03 , DOI: 10.1002/int.22728
Zhu Dongjie ₁ , Du Haiwen ₂ , Sun Yundong ₂ , Tian Zhaoshuo ₂ , Cao Ning ₃

Affiliation

Cache prefetching technology has become the mainstream data access optimization strategy in the Industrial Intelligent Systems (IIS) and the data centers. However, the rapidly increasing of unstructured data generates massive pairwise access relationships. Therefore, researchers have to make a choice between spatial locality and temporal locality to ensure an acceptable computational complexity. We propose cache-transaction-based data grouping model (CTDGM) to solve the problems described above by optimizing the feature representation method and grouping efficiency. First, we provide the definition of the cache transaction and propose the method for extracting the cache transaction feature (CTF). Second, we design a data chunking algorithm based on CTF and spatiotemporal locality to optimize the relationship calculation efficiency. Third, we propose CTDGM by constructing a relation graph that groups data into independent groups according to the strength of the data access relation. Based on the results of the experiment, compared with the state-of-the-art and traditional methods, our algorithm achieves an average increase in the cache hit rate of 5%–20% on the MSR, VDI-LUN, and KC data set, which in turn reduces the number of data I/O accesses by 30%–60%.

中文翻译：

一种基于缓存事务的非结构化数据存储系统的数据分组模型

缓存预取技术已成为工业智能系统（IIS）和数据中心的主流数据访问优化策略。然而，非结构化数据的快速增长产生了大量的成对访问关系。因此，研究人员必须在空间局部性和时间局部性之间做出选择，以确保可接受的计算复杂度。我们提出了基于缓存事务的数据分组模型（CTDGM），通过优化特征表示方法和分组效率来解决上述问题。首先，我们提供了缓存事务的定义，并提出了提取缓存事务特征（CTF）的方法。其次，我们设计了一种基于CTF和时空局部性的数据分块算法，以优化关系计算效率。第三，我们通过构建一个关系图来提出 CTDGM，该关系图根据数据访问关系的强度将数据分组为独立的组。基于实验结果，与state-of-the-art和传统方法相比，我们的算法在MSR、VDI-LUN和KC数据上实现了5%~20%的缓存命中率平均提升set，从而将数据 I/O 访问次数减少 30%–60%。

更新日期：2021-11-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11