Data prefetch for fast NDN software routers based on hash table-based forwarding tables
Introduction
A software router, which is built on a hardware platform based on a commercial off-the-shelf (COTS) computer, becomes feasible because of recent advances in multi-core CPUs and fast networking technologies for COTS computers. For notation simplicity, hereinafter, a COTS computer is simply referred to as a computer. Fast IP packet forwarding has been enabled by storing data structures for IP forwarding on a fast Static Random Access Memory (SRAM) device [1], [2]. Compact data structures have become a research issue due to the need keep up with the increasing number of IP prefixes in the Internet. Most of the studies have addressed compact trie-based data structures like multi-bit trie data structures [3] by replacing consecutive elements in a trie with a single element. Such efforts enable it to store the increasing number of IP prefixes, e.g., 7 × 105 prefixes [4] on the latest SRAM device.
Recently, studies on high-speed algorithms and compact data structures are being revisited due to the emergence of a new Internet architecture called Named Data Networking (NDN) [5], wherein rich data including large video data and small sensor data are delivered by a single architecture. Fast NDN packet forwarding is not trivial compared with IP packet forwarding in terms of memory space because name-based forwarding needs a larger forwarding table than IP one in order to store about 2.1 × 108 [6] name prefixes and per-packet caching requires additional memory spaces to store packets. Therefore, NDN data structures need to be stored on a slow Dynamic Random Access Memory (DRAM) device rather than a SRAM device even if compact trie-based data structures are used to store the forwarding table [7]. This implies that hiding the latency to access NDN data structures on the DRAM device is the key to fast NDN packet forwarding, rather than compacting such data structures.
According to this implication, in our previous paper [8], we first identified DRAM access latency of NDN data structures as a true bottleneck on a current state-of-the-art NDN software implementation [9] through an analysis conducted at the level of the CPU instruction pipeline. We proposed a prefetch algorithm of NDN data structures to hide such latency. The key idea is to handle multiple consecutive packets in a batch so that data prefetch of such data structures from a DRAM device overlaps with the computation of other packets. We developed the prefetch algorithm for two consecutive packets and experimentally demonstrated that the prefetch algorithm successfully hides most DRAM access latency so that the NDN packet forwarding rate linearly increases as the number of CPU cores increases.
In this paper, we extend our previous studies [8], [10] in terms of algorithm design and performance analysis to evaluate the prefetch algorithm clearly and in depth. The main contributions of this paper can be summarized as follows:
- •
The design rationale of the prefetch algorithm is strengthened by carefully choosing an appropriate data structure for hiding DRAM access latency. We evaluate three representative data structures, a hash table [9], a trie [7] and a bloom filter [11], for name-based forwarding, from the perspective of ease of hiding DRAM access latency. The metric of evaluating the easiness is defined as the number of dependent DRAM accesses, which should be sequentially performed. The quantitative comparison of the three data structures reveals that a hash table incurs the smallest number of dependent DRAM accesses, and thus a hash table is the best data structure among the three for hiding DRAM access latency.
- •
We design a prefetch algorithm for hash table-based forwarding tables by handling two consecutive packets in a batch. This algorithm hides DRAM access latency for accessing a hash entry, which is difficult to be hidden by the prefetch instruction alone when packets are handled packet-by-packet.
- •
We design a sophisticated prefetch algorithm for Longest Prefix Matching (LPM) in the FIB, which is potentially time-consuming. The algorithm avoids computations of calculating memory addresses of FIB entries whose prefixes are shorter than the matching prefix.
- •
We evaluate representative cache eviction and admission algorithms from the perspectives of both hiding DRAM access latency and achieving high cache hit rates. The evaluation reveals that TinyLFU [12] with FIFO eviction is an appropriate cache algorithm because this combination simultaneously achieves the small number of dependent DRAM accesses and high cache hit rates.
- •
We experimentally show that processing two consecutive packets in a batch is sufficient for hiding DRAM access latency on modern computers. We also experimentally show that increasing the number of consecutive packets hides DRAM access latency even in the case where DRAM access latency becomes large.
The rest of this paper is organized as follows. First, we identify a true bottleneck for high speed forwarding based on the instruction level analysis in Section 3, after explaining a state-of-the-art software NDN implementation used for the analysis in Section 2. In Sections 4 and 5, we choose data structures and algorithms for name-based packet forwarding and caching among the existing algorithms from the view-point of the ease of hiding DRAM access latency. Then we design a prefetch algorithm and evaluate it by implementing a proof-of-concept prototype in Section 6. We briefly introduce related work in Section 7, and finally conclude this paper in Section 8.
Section snippets
Hardware and software platforms
After describing a hardware platform, this section summarizes software design practices to exploit the parallel computation capabilities of CPU cores, which are essential to high-speed NDN packet forwarding.
Microarchitectural bottleneck analysis
To identify a true bottleneck of NDN packet forwarding on a computer, we conduct a microarchitectural analysis, which analyzes how the individual hardware components of the CPU spend time in the processing of NDN packets at the level of instructions and instruction pipelines.
Overview
In this section, we choose a FIB data structure appropriate for hiding DRAM access latency between three representative FIB data structures, i.e., a hash table-based FIB [7], a bloom filter-based FIB [11] and a trie-based one [28] by comparing the average numbers of dependent DRAM accesses for longest name prefix matching for a queried name. The word “dependent” means that the address of a “dependent” data piece is not determined until a data piece which contains the pointer to the dependent
Overview
In this section, we choose a cache algorithm appropriate for hiding DRAM access latency from representative algorithms, cache eviction algorithms and cache admission algorithms. Cache eviction determines a victim, i.e., a Data packet evicted from a CS, whereas cache admission decides whether a new Data packet is inserted into the CS or not. Usually, a cache admission algorithm is used with a simple cache eviction algorithm based on FIFO. An important difference from the choice of FIB data
Prefetch-friendly packet processing
In this section, we first identify data fetches causing instruction pipeline stalls which cannot be hidden by a conventional packet processing flow. Then, we design a prefetch-friendly packet processing flow to circumvent instruction pipeline stalls caused by the identified data fetches. Finally, we experimentally evaluate the performance gains obtained by using prefetch-friendly packet processing.
Related work
Prototypes of software NDN routers have been developed. Kirchner et al. [17] have implemented their software NDN router, named Augustus, in two different ways: a standalone monolithic forwarding engine based on the DPDK framework [15] and a modular one based on the Click framework. Though the forwarding speed of Augustus is very high, it does not approach the potential forwarding speed realized by computers. Hence, analyzing the bottlenecks of software NDN routers remains an issue.
Fast packet
Conclusion
In this paper, we identified the ideal form of a software NDN router on computers via the following steps: 1) we conducted a detailed study of the existing techniques for high-speed NDN packet forwarding and integrated them into a design rationale toward the realization of an ideal software NDN router. 2) We conducted microarchitectural and comprehensive bottleneck analyses on the software NDN router and revealed that to hide the DRAM access latency is vital to the realization of an ideal
Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Junji Takemasa is currently an employee of KDDI Research, Inc. The submitted work has been done when he has been a student at Graduate school of Information Science and Technology, Osaka University. Junji Takemasa has received the following research funding within five years.
-Grant-in-Aid for JSPS Fellows, No. 17J07276
*Period: 2017-2019
*Collaborators: None
Yuki
CRediT authorship contribution statement
Junji Takemasa: Methodology, Software, Investigation, Writing - original draft. Yuki Koizumi: Conceptualization, Writing - original draft. Toru Hasegawa: Methodology, Writing - review & editing, Supervision.
Acknowledgement
This work has been supported by JSPS KAKENHI Grant Number 17H01733.
Junji Takemasa received his Bachelor and Master of Information Science degrees from Osaka University, Japan, in 2014 and 2016, respectively. He is pursuing the Ph.D. degree at the Graduate School of Information Science and Technology, Osaka University. His research interests include Information Centric Networking, high-speed network system and green networking. He is a member of IEEE, IEICE and IPSJ.
References (45)
- et al.
High-speed data plane and network functions virtualization by vectorizing packet processing
Comp. Netw.
(2019) - et al.
Exploiting parallelism in hierarchical content stores for high-speed icn routers
Comp. Netw.
(2017) - et al.
Achieving one billion key-value requests per second on a single server
IEEE Micro
(2016) - et al.
A 50-Gb/s IP router
IEEE/ACM Trans. Network.
(1998) - et al.
Towards a gigabit IP router
J. High Speed Netw.
(1992) - et al.
Small forwarding tables for fast routing lookups
Proceedings of ACM SIGCOMM
(1997) - CIDR report,...
- et al.
Named data networking
ACM SIGCOMM Comp. Commun. Rev.
(2014) - December 2017 web server survey,...
- et al.
Scalable name-based packet forwarding: from millions to billions
Proceedings of ACM ICN
(2015)
Toward an ideal NDN router on a commercial off-the-shelf computer
Proceedings of ACM ICN
Named data networking on a router: fast and DoS-resistant forwarding with hash tables
Proceedings of ACM/IEEE ANCS
Poster: a method for designing high-speed software ndn routers
Proceedings of ACM ICN
Caesar: a content router for high-speed forwarding on content names
Proceedings of ACM ANCS
TinyLFU: a highly efficient cache admission policy
ACM Trans. Storage
Augustus: a CCN router for programmable networks
Proceedings of ACM ICN
Understanding sharded caching systems
Proceedings of IEEE INFOCOM
Cited by (8)
LIGHT: A Compatible, high-performance and scalable user-level network stack
2023, Computer NetworksTowards a Scalable Named Data Border Gateway Protocol
2022, International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2022Terabytes and Terabits/s Packet Caching in ICN Routers using Programmable Switches
2022, Proceedings of the 2022 IEEE Conference on Cloud Networking 2022, CloudNet 2022Dynamically Allocated Bloom Filter-Based PIT Architectures
2022, IEEE AccessA cache replacement strategy based on content features in named data networking
2021, ACM International Conference Proceeding SeriesVision: Toward 10 tbps ndn forwarding with billion prefixes by programmable switches
2021, ICN 2021 - Proceedings of the 2021 8th ACM Conference on Information-Centric Networking
Junji Takemasa received his Bachelor and Master of Information Science degrees from Osaka University, Japan, in 2014 and 2016, respectively. He is pursuing the Ph.D. degree at the Graduate School of Information Science and Technology, Osaka University. His research interests include Information Centric Networking, high-speed network system and green networking. He is a member of IEEE, IEICE and IPSJ.
Yuki Koizumi is an associate professor of Graduate School of Information Science and Technology, Osaka University, Japan. He received his Master of Information Science and Ph.D. of Information Science degrees from Osaka University, Japan, in 2006 and 2009, respectively. His research interests include Information Centric Networking and mobile networking. He is a member of IEEE, ACM, and IEICE.
Toru Hasegawa is a professor of Graduate school of Information and Science, Osaka University. He received the B.E., the M.E. and Dr. Informatics degrees in information engineering from Kyoto University, Japan, in 1982, 1984 and 2000, respectively. After receiving the master degree, he worked as a research engineer at KDDI R&D labs. (former KDD R&D labs.) for 29 years and moved to Osaka University. His current interests are future Internet, Information Centric Networking, mobile computing and so on. He has published over 100 papers in peer-reviewed journals and international conference proceedings including MobiCom, ICNP, IEEE/ACM Transactions on Networking, Computer Communications. He has served on the program or organization committees of several networking conferences such as ICNP, P2P, ICN, CloudNet, ICC, Globecom etc, and as TPC co-chair of Testcom/Fates 2008, ICNP 2010, P2P 2011 and Global Internet Symposium 2014. He received the Meritorious Award on Radi o of ARIB in2003, the best tutorial paper award in 2014 from IEICE and the best paper award in 2015 from IEICE. He is a fellow of IPSJ and IEICE.