T-RACKs: A Faster Recovery Mechanism for TCP in Data Center Networks,IEEE/ACM Transactions on Networking

当前位置： X-MOL 学术 › IEEE ACM Trans. Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

T-RACKs: A Faster Recovery Mechanism for TCP in Data Center Networks
IEEE/ACM Transactions on Networking ( IF 3.0 ) Pub Date : 2021-03-08 , DOI: 10.1109/tnet.2021.3059913
Ahmed M. Abdelmoniem , Brahim Bensaou

Cloud interactive data-driven applications generate swarms of small TCP flows that compete for the small switch buffer space in data-center. Such applications require a small flow completion time (FCT) to be effective. Unfortunately, TCP is myopic with respect to the composite nature of application data. In addition it tends to artificially inflate the FCT of individual flows by several orders of magnitude, because of its Internet-centric design, that fixes the retransmission timeout (RTO) to be at least hundreds of milliseconds. To better understand this problem, in this paper, we use empirical measurements in a small data center testbed to study, at a microscopic level, the effects of various types of packet losses on TCP’s performance. In particular, we single out packet losses that impact the tail end of small flows, as well as bursty losses that span a significant fraction of small TCP congestion windows, and show a non-negligible effect of such losses on the FCT. Based on this, we propose the so-called, timely-retransmitted ACKs (or T-RACKs), a simple loss recovery mechanism that conceals the drawbacks of the long RTO even in the presence of heavy packet losses. Interestingly enough, T-RACKS achieves this transparently to TCP itself as it does not require any change to TCP in the tenant’s virtual machine (VM) or container. T-RACKs can be implemented as a software shim layer in the hypervisor between the VMs and the server’s NIC or in hardware as a networking function in a SmartNIC. Simulation and real testbed results show remarkable performance improvements.

中文翻译：

T-RACK：数据中心网络中更快的 TCP 恢复机制

云交互式数据驱动应用程序生成成群的小型 TCP 流，这些流争夺数据中心中的小型交换机缓冲区空间。此类应用需要较短的流动完成时间 (FCT) 才能有效。不幸的是，TCP 就应用程序数据的复合性质而言是短视的。此外，由于其以互联网为中心的设计，它倾向于人为地将单个流的 FCT 膨胀几个数量级，将重传超时 (RTO) 固定为至少数百毫秒。为了更好地理解这个问题，在本文中，我们在一个小型数据中心测试台中使用经验测量，在微观层面研究各种类型的数据包丢失对 TCP 性能的影响。特别是，我们挑选出影响小流尾端的数据包丢失，以及跨越很大一部分小 TCP 拥塞窗口的突发丢失，并显示出此类丢失对 FCT 的不可忽视的影响。基于此，我们提出了所谓的及时重传 ACK（或 T-RACK），这是一种简单的丢失恢复机制，即使在存在大量数据包丢失的情况下也能隐藏长 RTO 的缺点。有趣的是，T-RACKS 对 TCP 本身透明地实现了这一点，因为它不需要对租户虚拟机 (VM) 或容器中的 TCP 进行任何更改。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层来实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。并显示此类损失对 FCT 的影响不可忽视。基于此，我们提出了所谓的及时重传 ACK（或 T-RACK），这是一种简单的丢失恢复机制，即使在存在大量数据包丢失的情况下也能隐藏长 RTO 的缺点。有趣的是，T-RACKS 对 TCP 本身透明地实现了这一点，因为它不需要对租户虚拟机 (VM) 或容器中的 TCP 进行任何更改。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层来实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。并显示此类损失对 FCT 的影响不可忽视。基于此，我们提出了所谓的及时重传 ACK（或 T-RACK），这是一种简单的丢失恢复机制，即使在存在大量数据包丢失的情况下也能隐藏长 RTO 的缺点。有趣的是，T-RACKS 对 TCP 本身透明地实现了这一点，因为它不需要对租户虚拟机 (VM) 或容器中的 TCP 进行任何更改。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层来实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。一种简单的丢失恢复机制，即使在存在大量数据包丢失的情况下也能隐藏长 RTO 的缺点。有趣的是，T-RACKS 对 TCP 本身透明地实现了这一点，因为它不需要对租户虚拟机 (VM) 或容器中的 TCP 进行任何更改。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。一种简单的丢失恢复机制，即使在存在大量数据包丢失的情况下也能隐藏长 RTO 的缺点。有趣的是，T-RACKS 对 TCP 本身透明地实现了这一点，因为它不需要对租户的虚拟机 (VM) 或容器中的 TCP 进行任何更改。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层来实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层来实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。T-RACK 可以作为 VM 和服务器 NIC 之间的虚拟机管理程序中的软件 shim 层来实现，或者作为 SmartNIC 中的网络功能在硬件中实现。仿真和真实的试验台结果显示了显着的性能改进。

更新日期：2021-03-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文