当前位置: X-MOL 学术J. Syst. Archit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving in-memory file system reading performance by fine-grained user-space cache mechanisms
Journal of Systems Architecture ( IF 4.5 ) Pub Date : 2021-01-13 , DOI: 10.1016/j.sysarc.2021.101994
Rong Gu , Chongjie Li , Haipeng Dai , Yili Luo , Xiaolong Xu , Shaohua Wan , Yihua Huang

Nowadays, as the memory capacity of servers become larger and larger, distributed in-memory file systems, which enable applications to interact with data at fast speed, have been widely used. However, the existing distributed in-memory file systems still face the problem of low data access performance in small data reading, which seriously reduce their usefulness in many important big data scenarios. In this paper, we analyze the factors that affect the performance of reading in-memory files and propose a two-layer user space cache management mechanism: in the first layer, we cache data packet references to reduce frequent page fault interruptions (packet-level cache); in the second layer, we cache and manage small file data units to avoid redundant inter-process communications (object-level cache). We further design a fine-grained caching model based on the submodular function optimization theory, for efficiently managing the variable-length cache units with partially overlapping fragments on the client side. Experimental results on synthetic and real-world workloads show that compared with the existing cutting-edge systems, the first level cache can double the reading performance on average, and the second level cache can improve random reading performance by more than 4 times. Our caching strategies also outperform the cutting-edge cache algorithms over 20% on hit ratio. Furthermore, the proposed client-side caching framework idea has been adopted by the Alluxio open source community, which shows the practical benefits of this work.



中文翻译:

细粒度的用户空间缓存机制提高了内存中文件系统的读取性能

如今,随着服务器的存储容量变得越来越大,使应用程序能够与数据快速交互的分布式内存文件系统已被广泛使用。然而,现有的分布式内存中文件系统在小数据读取中仍然面临数据访问性能低下的问题,这严重降低了它们在许多重要的大数据场景中的实用性。在本文中,我们分析了影响内存文件读取性能的因素,并提出了两层用户空间缓存管理机制:在第一层,我们缓存数据包引用以减少频繁的页面错误中断(数据包级)缓存);在第二层中,我们缓存和管理小文件数据单元,以避免冗余的进程间通信(对象级缓存)。我们进一步设计了基于亚模函数优化理论的细粒度缓存模型,以有效管理客户端上部分重叠片段的可变长度缓存单元。综合和实际工作负载的实验结果表明,与现有的尖端系统相比,第一级缓存平均可以使读取性能提高一倍,第二级缓存可以将随机读取性能提高4倍以上。我们的缓存策略在命中率上也超过了尖端缓存算法超过20%。此外,Alluxio开源社区已采用了建议的客户端缓存框架思想,这表明了这项工作的实际好处。用于有效地管理客户端上具有部分重叠片段的可变长度缓存单元。综合和实际工作负载的实验结果表明,与现有的尖端系统相比,第一级缓存平均可以使读取性能提高一倍,第二级缓存可以将随机读取性能提高4倍以上。我们的缓存策略在命中率上也超过了尖端缓存算法超过20%。此外,Alluxio开源社区已采用了建议的客户端缓存框架思想,这表明了这项工作的实际好处。用于有效地管理客户端上具有部分重叠片段的可变长度缓存单元。综合和实际工作负载的实验结果表明,与现有的尖端系统相比,第一级缓存平均可以使读取性能提高一倍,第二级缓存可以将随机读取性能提高4倍以上。我们的缓存策略在命中率上也超过了尖端缓存算法超过20%。此外,Alluxio开源社区已采用了建议的客户端缓存框架思想,这表明了这项工作的实际好处。第一级缓存平均可以使读取性能提高一倍,第二级缓存可以将随机读取性能提高4倍以上。我们的缓存策略在命中率上也超过了尖端缓存算法超过20%。此外,Alluxio开源社区已采用了建议的客户端缓存框架思想,这表明了这项工作的实际好处。第一级缓存平均可以使读取性能提高一倍,第二级缓存可以将随机读取性能提高4倍以上。我们的缓存策略在命中率上也超过了尖端缓存算法超过20%。此外,Alluxio开源社区已采用了建议的客户端缓存框架思想,这表明了这项工作的实际好处。

更新日期:2021-01-24
down
wechat
bug