A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-02-01 , DOI: 10.1109/tpds.2020.3012929
MD S Q Zulkar Nine , Tevfik Kosar

The amount of data transferred over dedicated and non-dedicated network links has been increasing much faster than the increase in the network capacity. On the other hand, the current data transfer solutions fail to guarantee even the promised achievable transfer throughput. In this article, we propose a novel two-phase dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In the offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. The online phase uses the discovered knowledge from the offline analysis along with the real-time investigation of the network condition to optimize the protocol parameters. As the real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data characteristics to reduce the real-time investigation overhead while ensuring near-optimal throughput for each transfer. Our novel approach is tested over different networks with different datasets, and it has outperformed its closest competitor by 1.7x and the default case by 5x. It also achieved up to 93 percent accuracy compared to the optimal achievable throughput possible on those networks.

中文翻译：

大数据传输的两阶段动态吞吐量优化模型

通过专用和非专用网络链接传输的数据量的增长速度远远快于网络容量的增长。另一方面，当前的数据传输解决方案甚至无法保证承诺的可实现传输吞吐量。在本文中，我们提出了一种基于数学建模、离线知识发现/分析和自适应在线决策的新型两阶段动态吞吐量优化模型。在离线分析中，我们挖掘历史传输日志以进行关于传输特性的知识发现。在线阶段使用从离线分析中发现的知识以及对网络状况的实时调查来优化协议参数。由于实时调查成本高昂，并提供有关当前网络状态的部分知识，我们的模型使用有关网络和数据特征的历史知识来减少实时调查开销，同时确保每次传输的吞吐量接近最佳。我们的新方法在具有不同数据集的不同网络上进行了测试，它的性能比其最接近的竞争对手高 1.7 倍，比默认情况高 5 倍。与这些网络上可能达到的最佳吞吐量相比，它还实现了高达 93% 的准确率。它的表现比最接近的竞争对手高 1.7 倍，比默认情况高 5 倍。与这些网络上可能达到的最佳吞吐量相比，它还实现了高达 93% 的准确率。它的表现比最接近的竞争对手高 1.7 倍，比默认情况高 5 倍。与这些网络上可能达到的最佳吞吐量相比，它还实现了高达 93% 的准确率。

更新日期：2021-02-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11