当前位置: X-MOL 学术IEEE ACM Trans. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
REINFORCE: Achieving Efficient Failure Resiliency for Network Function Virtualization-Based Services
IEEE/ACM Transactions on Networking ( IF 3.7 ) Pub Date : 2020-02-21 , DOI: 10.1109/tnet.2020.2969961
Sameer G. Kulkarni , Guyue Liu , K. K. Ramakrishnan , Mayutan Arumaithurai , Timothy Wood , Xiaoming Fu

Ensuring high availability (HA) for software-based networks is a critical design feature that will help the adoption of software-based network functions (NFs) in production networks. It is important for NFs to avoid outages and maintain mission-critical operations. However, HA support for NFs on the critical data path can result in unacceptable performance degradation. We present REINFORCE, an integrated framework to support efficient resiliency for NF service chains. REINFORCE includes timely failure detection and consistent failover mechanisms. REINFORCE replicates state to standby NFs (local and remote) while enforcing correctness. It minimizes the number of state transfers by exploiting the concept of external synchrony, and leverages opportunistic batching and multi-buffering to optimize performance. Experimental results show that, even at line-rate packet processing (10 Gbps), REINFORCE achieves chain-level failover across servers in a LAN within 10ms, incurring less than 10% performance overhead, and adds average latency only $\sim 400~\mu \text{s}$ , with a worst-case latency of less than 1ms. REINFORCE also recovers from software failures within the same node in less than $100~\mu \text{s}$ , incurring less than 1% performance overhead and adds less than $5~\mu \text{s}$ latency during normal operation.

中文翻译:

REINFORCE:为基于网络功能虚拟化的服务实现有效的故障恢复能力

确保基于软件的网络的高可用性(HA)是一项关键的设计功能,它将有助于在生产网络中采用基于软件的网络功能(NF)。对于NF而言,避免中断并维持关键任务操作非常重要。但是,HA对关键数据路径上的NF的支持可能导致无法接受的性能下降。我们提出了REINFORCE,这是一个支持NF服务链的高效弹性的集成框架。REINFORCE包括及时的故障检测和一致的故障转移机制。REINFORCE在强制正确性的同时将状态复制到备用NF(本地和远程)。通过利用外部同步的概念,它最大程度地减少了状态转移的数量,并利用机会性批处理和多缓冲来优化性能。实验结果表明, $ \ sim 400〜\ mu \ text {s} $ ,最坏情况下的延迟不到1毫秒。REINFORCE还可以从同一节点内的软件故障中恢复到不到 $ 100〜\ mu \ text {s} $ ,产生的性能开销不到1%,并且增加了不到 $ 5〜\ mu \ text {s} $ 正常操作期间的延迟。
更新日期:2020-04-22
down
wechat
bug