ODDS: Optimizing Data-locality Access for Scientific Data Analysis,IEEE Transactions on Cloud Computing

当前位置： X-MOL 学术 › IEEE Trans. Cloud Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ODDS: Optimizing Data-locality Access for Scientific Data Analysis
IEEE Transactions on Cloud Computing ( IF 6.5 ) Pub Date : 2020-01-01 , DOI: 10.1109/tcc.2017.2754484
Jun Wang , Dezhi Han , Jiangling Yin , Xiaobo Zhou , Changjun Jiang

Whereas traditional scientific applications are computationally intensive, recent applications require more data-intensive analysis and visualization to extract knowledge from the explosive growth of scientific information and simulation data. As the computational power and size of compute clusters continue to increase, the I/O read rates and associated network for these data-intensive applications have been unable to keep pace. These applications suffer from long I/O latency due to the movement of “big data” from the network/parallel file system, which results in a serious performance bottleneck. To address this problem, we proposed a novel approach called “ODDS” to optimize data-locality access in scientific data analysis and visualization. ODDS leverages a distributed file system (DFS) to provide scalable data access for scientific analysis. Through exploiting the information of underlying data distribution in DFS, ODDS employs a novel data-locality scheduler to transform a compute-centric mapping into a data-centric one and enables each computational process to access the needed data from a local or nearby storage node. ODDS is suitable for parallel applications with dynamic process-to-data scheduling and for applications with static process-to-data assignment. To demonstrate the efficacy of our methods, we present and evaluate ODDS in the context of two state-of-the-art, scientific-analysis applications—mpiBLAST and ParaView—along with the Hadoop distributed file system (HDFS) across a wide variety of computing platform settings. In comparison to existing deployments using NFS, PVFS, or Lustre as the underlying storage systems, ODDS can greatly reduce the I/O cost and double overall performance.

中文翻译：

ODDS：优化科学数据分析的数据本地访问

传统的科学应用是计算密集型的，而最近的应用需要更多的数据密集型分析和可视化，以从科学信息和模拟数据的爆炸性增长中提取知识。随着计算集群的计算能力和规模不断增加，这些数据密集型应用程序的 I/O 读取率和相关网络已无法跟上步伐。由于来自网络/并行文件系统的“大数据”移动，这些应用程序会遭受较长的 I/O 延迟，从而导致严重的性能瓶颈。为了解决这个问题，我们提出了一种称为“ODDS”的新方法来优化科学数据分析和可视化中的数据局部性访问。ODDS 利用分布式文件系统 (DFS) 为科学分析提供可扩展的数据访问。通过利用 DFS 中底层数据分布的信息，ODDS 采用一种新颖的数据局部性调度程序，将计算中心映射转换为数据中心映射，并使每个计算过程能够从本地或附近的存储节点访问所需的数据。ODDS 适用于具有动态进程到数据调度的并行应用程序和具有静态进程到数据分配的应用程序。为了证明我们的方法的有效性，我们在两个最先进的科学分析应用程序——mpiBLAST 和 ParaView——以及 Hadoop 分布式文件系统 (HDFS) 的背景下展示和评估了 ODDS计算平台设置。与使用 NFS、PVFS 或 Lustre 作为底层存储系统的现有部署相比，

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>