Elsevier

Computer Networks

Volume 199, 9 November 2021, 108437
Computer Networks

Cache management for large data transfers and multipath forwarding strategies in Named Data Networking

https://doi.org/10.1016/j.comnet.2021.108437Get rights and content

Abstract

Named Data Networking (NDN) is a promising approach to provide fast in-network access to compact muon solenoid (CMS) datasets. It proposes a content-centric rather than a host-centric approach to data retrieval. Data packets with unique and immutable names are retrieved from a content store (CS) using Interest packets. The current NDN architecture relies on forwarding strategies that are only dependent upon on-path caching. Such a design does not take advantage of the cached content available on the adjacent off-path routers in the network, thus reducing data transfer efficiency. In this work, we propose a software-defined, storage-aware routing mechanism that leverages NDN router cache-states, software defined networking (SDN) and multipath forwarding strategies to improve the efficiency of very large data transfers. First, we propose a novel distributed multipath (D-MP) forwarding strategy and enhancements to the NDN Interest forwarding pipeline. In addition, we develop a centralized SDN-enabled control for the multipath forwarding strategy (S-MP), which leverages the global knowledge of NDN network states that distributes Interests efficiently. We perform extensive evaluations of our proposed methods on an at-scale wide area network (WAN) testbed spanning six geographically separated sites. Our proposed solutions easily outperform the existing NDN forwarding strategies. The D-MP strategy results in performance gains ranging between 10.4x to 12.5x over the default NDN implementation without in-network caching, and 12.2x to 18.4x with in-network caching enabled. For S-MP strategy, we demonstrate a performance improvement of 10.6x to 12.6x, and 12.9x to 18.5x, with in-network caching disabled and enabled, respectively. Further, we also present a comprehensive analysis of NDN cache management for large data transfers and propose a novel prefetching mechanism to improve data transfer performance. Due to the inherent capacity limitations of the NDN router caches, we use SDN to provide an intelligent and efficient solution for data distribution and routing across multiple NDN router caches. We demonstrate how software-defined control can be used to partition and distribute large CMS files based on NDN router cache-state knowledge. Further, SDN control will also configure the router forwarding strategy to retrieve CMS data from the network. Our proposed solution demonstrates that the CMS datasets can be retrieved 28%–38% faster from the NDN routers’ caches than existing NDN approaches. Lastly, we develop a prefetching mechanism to improve the transfer performance of files that are not available in the router’s cache.

Introduction

Data management in high energy physics (HEP) is challenging due to its complexity and volume. These datasets are immutable once generated by the experiments; scientists repeatedly read and process these datasets. A critical challenge in the CMS workflow is how to deliver large volumes of data to researchers efficiently. An experiment data file has an average size of 2 Gigabytes, with file sizes ranging between 100 Megabytes and 20 Gigabytes. A complete dataset comprises multiple files, with the dataset sizes ranging from 2100 Terabytes. Thus, providing speedy access to such datasets becomes a key enabler for data-intensive science research. The CMS experiment on the Large Hadron Collider (LHC) manages a large volume of data that currently exceeds 100PB across multiple sites. The experiment manages approximately 35PB of data (a combination of detector readouts and simulated readouts across various physics-related formats); this data is write-once and read-many. All CMS managed data is immutable once written to permanent storage [3]. Through a combination of caching and pre-placement, CMS moves its data across over 50 data centers throughout the Worldwide LHC Computing Grid [4]. Data delivery for CMS experimental workflows is challenging due to the large size of the datasets. Further, exchanging the data between different sites – streaming to use laptops or offsite batch jobs – has historically required a burdensome set of middleware and dedicated computing infrastructure. Therefore, a better solution is to provide fail-over services, multiple repositories, and assist in the synchronization across multiple data repositories, reducing the overheads on the original dataset source and, consequently, the data transfer latencies. In this work, we study how to leverage Information-Centric Networking (ICN) to provide faster in-network CMS data access to end-users. Contrary to IP-based, host-centric Internet architectures, ICN emphasizes content by making it directly addressable and routable. The users request the data based on its name instead of its IP addresses.

Named Data Networking (NDN) [5] is one such architecture that proposes the use of names for fetching data instead of relying on addresses for identifying data locality. The end-user sends an Interest packet with the data name and the network is responsible for both forwarding and caching the requested data. One of the main characteristics of NDN is in-network caching, where the router keeps a copy of the data to satisfy a future request. This reduces the latency arising from fetching the data from the source for all subsequent requests. The software defined networking [6] paradigm has generated significant interest in the information centric networking (ICN) community. SDN has been used to address the name-based routing and forwarding [7], [8] by decoupling the ICN data plane from its control plane [9].

Under the NDN paradigm, a content store (CS) acts as a cache management data structure. The CS is an in-network cache and performs data lookups for incoming Interests and serves the consumers without the need for forwarding the Interests to the NDN producers. In the current NDN implementation, it is only beneficial to cache the data in the CS when the cached contents are available on the path to the content producer. This a serious limitation as it reduces the data transfer efficiency by ignoring the (requested) cached content available on adjacent/off-path routers in the network. The adjacent/off-path routers are generally closer (in terms of the number of hops or routing cost) to the consumer when compared to the NDN producer. Therefore, fetching the data from the producer and caching it only in the on-path router instead of also utilizing the adjacent/off-path routers is inefficient. Thus, fetching the data from both the on/off path routers would greatly improve the data retrieval performance.

In this paper, we propose a multipath forwarding strategy to address the above problem. Our first approach proposes enhancements to the existing NDN Forwarding Daemon (NFD) implementation. Specifically, we propose a forwarding strategy that retrieves non-overlapping data packets from multiple routers simultaneously. Further, the strategy provides additional flexibility in the per-router choice of the Interest pipeline depth configuration. Next, we propose a centralized approach using a SDN controller for managing/mapping the current contents of the CS. This approach allows us to make intelligent Interest pipeline forwarding decisions by analyzing the global view of the NDN network. The SDN controller is effectively used to analyze the network state and redirects the incoming Interests to the off-path routers that have cached the requested content. In our work, we enhance the data retrieval process for both cases by allowing both the NDN consumer and the NDN routers to fetch the content from multiple off-path locations based on the network states. Our proposed approaches, while improving data transfer performance on the one hand, also ensures congestion avoidance on a specific path by distributing Interests across multiple available paths.

To employ NDN for CMS workflows, we must also address the critical challenge of how to store large files in the network efficiently. In the NDN paradigm, a content store (CS) is an in-network cache that performs data lookups for incoming Interests and serves the consumers without the need to forward the Interests to the NDN producers. Due to the limitation of the cache capacity on each NDN router, novel approaches for efficient cache management are necessary. One approach is to deploy the NDN routers with large memory. However, this will increase the deployment costs and is therefore inefficient. Another approach is to use solid-state drives (SSD) for caching the data; the use of SSDs for caching not only increases the overall cost of the deployment but also adds additional data retrieval latencies. In the Information-Centric Networking (ICN) community, software-defined approaches for NDN routing intelligence and caching management is an active area of research. To address the above problems, we propose a solution that employs an SDN controller to manage cache-aware NDN routers. Our proposed approach works in two phases. First, during the file retrieval process, if the file is not cached in the network (and resides on the producer storage), the interest packet will be forwarded to the centralized controller for the best retrieval strategy. Small files are retrieved using the default NDN approach. However, for large files that cannot be cached on a single router, a distributed retrieval approach using multiple router content stores will be used. Second, depending on whether the requested file is already cached on multiple routers or not, the controller will provide a strategy for distributed file retrieval (See Section 3). Further, our proposed system architecture also enables the prefetch feature, where parts of the file can be prefetched and cached on different routers simultaneously.

Specifically, our solution, in comparison to the original NDN data repository and synchronization implementations, exploits multiple paths and off-path routers (not possible in the default NDN implementation) to optimize end-to-end data transfers. Lastly, our solution provides a better data management solution by offloading key decision-making tasks to an SDN controller.

The main contributions of our work are listed below:

  • 1.

    We propose a distributed multipath (D-MP) forwarding strategy for NDN Interest pipeline processing and data retrieval. This approach demonstrates simultaneous data retrieval from a set of n routers with pre-configured Interest pipeline depths. In comparison to the default NDN implementation, our D-MP strategy performs over 10x better than the alternative.

  • 2.

    We propose a centralized, SDN-enabled control for our multipath forwarding strategy (S-MP). We show that the centralized control (S-MP), unlike the D-MP case, provides additional benefits due to the knowledge of the global NDN network and cache states.

  • 3.

    For both D-MP and S-MP approaches, we present NFD configuration algorithms detailing the consumers’ Interest and routing pipelines, interfaces and the Interest distribution strategy.

  • 4.

    We present cache management strategies for large data transfers using NDN. Using software-defined control, we present strategies for partitioning and distributing large CMS files based on NDN routers’ cache-state knowledge.

  • 5.

    We also develop a prefetching mechanism to reduce the data retrieval latency specifically for large file transfers. Our proposed approach further improves the data transfer performance by optimizing the file retrieval time while reducing the path latency.

  • 6.

    Lastly, we evaluate the performance of our multipath forwarding and cache management solutions for large data transfers on an at-scale, geographically distributed wide area network (WAN) research testbed and provide valuable WAN performance insights.

The paper is organized as follows: Section 2 presents background on named data networking (NDN), software defined networking (SDN) for NDN and the related works; In Section 3, we describe our proposed system architecture for NDN multipath forwarding strategies and SDN control of NDN; Section 4 outlines our solution approach for NDN Interest pipeline management for both D-MP and S-MP usecases. In Section 5, we describe our proposed approach for NDN cache management for large file transfers. In Section 6, we preent an analysis of the NDN multipath strategies. We describe our evaluation framework, network testbed setup and experimental design for multipath forwarding strategies in Section 7. In Section 8, we present extensive results and discussions for our proposed multipath forwarding strategies, SDN control for NDN and large data transfer cache management approaches. Finally, we conclude our work in Section 9.

Section snippets

Named Data Networking

The traditional IP-networking has problems such as IP mobility, network address translation (NAT) traversals, and address space limits. Named Data Networking (NDN) [5] is an excellent solution to mitigate such problems. NDN is a Future Internet Architecture (FIA) [10] project that proposes re-designing the current host-centric Internet architecture. It is developed on a name-based packet forwarding and routing scheme, using a hierarchical and unbounded namespace. These ensure the communication

System Architecture

NFD employs a per-namespace forwarding strategy to forward Interests. The strategy choice would affect packet forwarding decisions and play an important role in fetching the data from a given NDN router. Several Interest forwarding strategies are available for use by the NFD, including best routes, multicast, client control, NCC (implemented from CCNx, i.e., CCN backward), access router, and adaptive SRTT (smoothed round-trip time) -based Forwarding (ASF) [29] strategy.

Although sufficient for

Multipath Forwarding Strategies for NDN

The default NDN implementation relies on data caching only on-the-path to the content producer. This limitation reduces the data transfer efficiency as it ignores the (requested) cached content that may be available on adjacent/off-path routers in the network. These off-path routers are generally closer to the consumer when compared to the content producer. Therefore, data retrieval from both on- and off-path routers can greatly improve data retrieval performance. In this section, we outline

Cache Management for Large Data Transfers

In this section, we describe the implementation approach for cache management and data distribution.

NDN Multipath Strategy Analysis

In order to understand the benefits of the multipath strategies detailed in this work, we present the analysis and comparison of our proposed multipath forwarding strategy with the default NDN strategy. The default NDN strategy relies on sending a single Interest packet from the NDN consumer to the NDN producer through a single pre-configured path. In contrast, our proposed multipath forwarding strategies employ multiple paths to retrieve data simultaneously from nearby or adjacent off-path

Evaluation

In this section, we describe our network testbed setup, datasets used in the performance evaluation, associated parameters, and the experimental design.

Results and Discussion

In this section, we present the performance results of our proposed multipath forwarding strategies, cache management for large data transfer and demonstrate the benefits of our proposed prefetching feature.

Conclusions

In this paper, we present an architecture that uses centralized control with NDN to provide faster in-network access to large datasets. We use SDN to provide an intelligent and efficient solution for data distribution and retrieval across multiple NDN routers’ caches. The SDN controller splits and distributes large data files into multiple NDN routers’ content stores. Our proposed system architecture results in a performance gain of 28.1%–38% compared to the current NDN architecture. Moreover,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This material is based upon work supported by the National Science Foundation, United States under Grant Numbers OAC-1541442 and CNS-1817105. This work was completed using the Holland Computing Center of the University of Nebraska, which receives support from the Nebraska Research Initiative. The authors would like to thank Garhan Attebury, Holland Computing Center at UNL for his valuable support.

Mohammad Alhowaidi received his Ph.D. degree in Computer Engineering from the University of Nebraska-Lincoln. He has a master’s degree in Computer Science and Engineering from Linkoping University, Sweden. He received his bachelor’s degree in Computer Engineering from Jordan University of Science and Technology, Jordan. His research interests are in future internet architectures, software defined networking, optical networks, and resource allocation in cloud networks.

References (32)

  • AubryE. et al.

    Implementation and Evaluation of a Controller-Based Forwarding Scheme for NDN

  • VahlenkampM. et al.

    Enabling Information Centric Networking in IP Networks Using SDN

  • RexfordJ. et al.

    Future internet architecture: clean-slate versus evolutionary research

    Commun. ACM

    (2010)
  • FeamsterN. et al.

    The road to SDN: an intellectual history of programmable networks

    ACM SIGCOMM Comput. Commun. Rev.

    (2014)
  • LiJ. et al.

    A novel forwarding and routing mechanism design in SDN-based NDN architecture

    Front. Inf. Technol. Electron. Eng.

    (2018)
  • CombeT. et al.

    An SDN and NFV Use Case: NDN Implementation and Security Monitoring

  • Cited by (0)

    Mohammad Alhowaidi received his Ph.D. degree in Computer Engineering from the University of Nebraska-Lincoln. He has a master’s degree in Computer Science and Engineering from Linkoping University, Sweden. He received his bachelor’s degree in Computer Engineering from Jordan University of Science and Technology, Jordan. His research interests are in future internet architectures, software defined networking, optical networks, and resource allocation in cloud networks.

    Deepak Nadig is currently an Assistant Professor in the Department of Computer and Information Technology at Purdue University. From 2009 to 2015, Dr. Nadig was the Director of Technology and Research at SOLUTT Corporation, India. He was instrumental in leading the networking and wireless operations in 4G/LTE and multi-gigabit wireless technologies. His research focuses on the interplay between Software Defined Networks and Network Functions Virtualization for developing secure and intelligent next-generation networks. His research interests are in Computer Networks, Software Defined Networks, Network Functions Virtualization, Cloud-native Infrastructure, Network Security and AI/ML applications to networking. Dr. Nadig has a Ph.D. in Computer Engineering from the University of Nebraska-Lincoln (UNL).

    Boyang Hu is currently a Ph.D. student in Computer Science and Engineering at the University of Nebraska-Lincoln (UNL). He received the B.S. degrees from Beijing Jiaotong University, China, in 2011 and M.S. from the University of Maryland, College Park, US, in 2013. His research interests include Network Security, Software Defined Networks (SDN), and Network Functions Virtualization (NFV).

    Byrav Ramamurthy is currently a Professor and former Graduate Chair in the Department of Computer Science and Engineering at the University of NebraskaLincoln (UNL). He is the author of the book “Design of Optical WDM Networks LAN, MAN and WAN Architectures” and a co-author of the book “Secure Group Communications over Data Networks” published by Kluwer Academic Publishers/Springer in 2000 and 2004 respectively. He served as the Chair of the IEEE Communication Society’s Optical Networking Technical Committee (ONTC) during 2009–2011. He served as the IEEE INFOCOM 2011 TPC Co-Chair. He is currently the Editor-in-Chief for the Springer Photonic Network Communications (PNET) journal. His research areas include optical and wireless networks, peer-to-peer networks for multimedia streaming, network security and telecommunications. His research work is supported by the U.S. National Science Foundation, U.S. Department of Energy, U.S. Department of Agriculture, NASA, AT&T Corporation, Agilent Tech., Ciena, HP and OPNET Inc.

    Brian Bockelman is currently an Associate Scientist at the Morgridge Institute for Research, University of Wisconsin-Madison. His research interests are in Research Computing and Distributed High-Throughput Computing (DHTC). For over a decade, he has worked with the Open Science Grid on issues in distributed high throughput computing and now serves as the Technology Area Coordinator, leading the evolution of the technologies used by the OSG. Within Nebraska, Dr. Bockelman served as a key staff member of the Holland Computing Center (2008–2019) and as an Associate Research Professor in the Computer Science and Engineering department and worked on the CMS project, which hosts significant computing resources at the Holland Computing Center.

    An earlier version of this work was presented at the 2018 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS) (Alhowaidi et al., 2018) [1] and the 2019 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS) (Alhowaidi et al., 2019) [2].

    1

    The authors acknowledge the valuable contributions of Dr. David R. Swanson (Deceased), Holland Computing Center, UNL, to this work.

    2

    This work was conducted when the second author was at the University of Nebraska-Lincoln.

    View full text