Distributed learning dynamics of Multi-Armed Bandits for edge intelligence

https://doi.org/10.1016/j.sysarc.2020.101919Get rights and content

Abstract

Multi-agent decision making is a fundamental problem in edge intelligence. In this paper, we study this problem for IoT networks under the distributed Multi-Armed Bandits (MAB) model. Most of existing works for distributed MAB demand long-time stable networks connected by powerful devices and hence may not be suitable for mobile IoT networks with harsh IoT constraints. To meet the challenge of resource constraints in mobile IoT environment, we propose a lightweight and robust learning algorithm in a dynamic network allowing topology changes. In our model, each agent is assumed to have only limited memory and communicate with each other asynchronously. Moreover, we assume that the bandwidth for exchanging information is limited and each agent can transmit O(log2K) bits (K denotes the number of arms) per communication. Rigorous analysis shows that despite these harsh constraints, the best arm/option can be identified collaboratively by the agents and the algorithm converges efficiently. Extensive experiments illustrate that the proposed algorithm exhibits good efficiency and stability in mobile settings.

Introduction

Recent years have witnessed the prosperity of internet of things (IoT) due to the rapid advancement of technology of chip integration, wireless communication, big data and artificial intelligence, etc. IoT devices, which distributed at the network edge, make data explored efficiently and facilitate the data aggregation through local computation and communication with each other. With these merits, IoT seems to be a good fit for implementing large scaled machine learning algorithms and developing edge intelligence, which in turn helps to complete some tasks in IoT such as decision making, underlying environment learning, etc. Hence the learning paradigm shifts from centralized cloud computing to edge computing (fog computing) recently which results in edge intelligence. Edge intelligence refers to a set of connected edge devices (e.g., IoT devices, smartphones, UAVs) that is used for data collection, caching, processing, and analysis based on artificial intelligence [1]. However, traditional machine learning algorithms are typically demanding in computing resources [2], [3], e.g. CPU, GPU, and memory, which contrasts the lightweight computation capacity, limited memory and mobility of edge devices. To push the computation tasks from the cloud server to the edge, we have to devise lightweight learning algorithms [4], [5], [6] for the edge (e.g. IoT networks).

Multi-agent decision making is a common task in edge intelligence. We employ distributed multi-armed bandits (MAB) to formalize our multi-agent decision-making problem, where multiple networked agents face a set of arms (options) and each arm is represented by a stochastic reward with a mean unknown to the agent. The agents’ goal is to learn the best arm/option collaboratively by exchanging local states or opinions between each other iteratively until reaching a consensus. In fact, distributed MAB algorithm for large scaled networks has been studied extensively (see Section 2). Unfortunately, most of these works demand long-time stable networks connected by powerful devices, which are not suitable for edge intelligence. A natural question comes that whether an efficient and robust collaborative learning algorithm can be devised such that it can overcome constraints of IoT devices and be implemented even in mobile IoT networks.

To settle this problem, we present in this paper a collaborative learning algorithm for dynamic IoT networks. In particular, the network is modeled as a connected network whose topology changes arbitrarily over time on a fixed set of agents [7], [8], [9], [10]. For the sake of being suitable for IoT, we also impose some necessary constraints on the system model. Firstly, we assume that each agent has only bounded memory such that no learning history can be stored. Secondly, the bandwidth of communication channel is limited such that each agent can transmit at most log2K (K denotes the number of arms) bits per communication. Lastly, the time is assumed to be continuous and agents take actions asynchronously by equipping each agent an independent Poisson clock with a common parameter. The Poisson clock model is natural for modeling swarm intelligent systems that are not fully synchronized [11], [12].

Under the dynamic network model we mentioned above, we propose the distributed collaborative learning algorithm that can be implemented in dynamic networks under the IoT constraints. With detailed and rigorous analysis, it can be assured with a high probability that every agent pulls only the best arm eventually. We also conducted extensive experiments to confirm the efficacy of our proposed distributed learning algorithm. The results show that the convergence of our learning algorithm is quick and hence the algorithm is robust and efficient in real settings.

The remainder of this paper is organized as follows. Section 2 introduces the related works. Section 3 presents our system model. Section 4 and Section 5 show the learning algorithm. We give the detailed analysis in Section 6. Section 7 shows the practical performance evaluation. Finally we conclude our paper in Section 8.

Section snippets

Related work

In recent years, multi-armed bandit (MAB) algorithms have received extensive attention and in-depth research, given that MAB settings are not only challenging in theory [13], [14], [15], [16] but also useful in practice [17], [18], [19], [20]. The basic MAB setting is concerned with learning the priori unknown rewards of a single optimal arm in a group of candidate arms by trying one arm in turn, and observing the rewards [21]. In the centralized setting, there are two main research directions

Model

We consider a dynamic network consisting of N agents (also called nodes). The communication model is represented by an undirected graph G=(N,E). The agents exchange messages with their neighbors in the graph G, where N={1,,N} and E is the set of edges. Let Vi={iN(i,i)E} denote the neighbors of agent i. The topology of the communication network can change arbitrarily over time. More specifically, edges in the graph is constantly changing based on a fixed set of agents. The neighbors of

Algorithm

In order to identify the best arm a1 efficiently, each agent collaborates with its neighbors. The two-step distributed learning dynamics consisting of sampling and adopting have been widely adopted in the literature [48], [49]. But in previous works, the convergence of this learning process highly relies on strong constraints on the network topology, which makes it hard to implement in general scenarios, let alone dynamic settings. We here adapt the two-step learning process into a three-step

Generating the selection matrix

In this section, we introduce how to produce the probability selection matrix P using Maximum-Degree matrix and Metropolis–Hastings matrix. The two matrixes are traditionally proposed in analyzing Markov chains, such as in [50]. The MD and MH approaches can be implemented in difference scenarios. Specifically, in the MD approach, each agent needs to know the maximum degree and its own degree, while in the MH approach, only the node’s own degree and its neighbors’ degrees are needed. We will

Analysis

In this section, we analyze the convergence of our proposed collaborative learning process.

Experiments

In this section, we analyze the performance through the simulation experiments. The impact of the parameters on the performance of the learning algorithm, including the number of arms M and the number of agents N, are analyzed. The two approached of setting up the probability selection matrix P, Maximum-Degree Algorithm (MD) and Metropolis–Hastings Algorithm (MH), are compared in the dynamic setting.

We use the proportion of edges that change with time in the communication model to define the

Conclusion

In this paper, we presented a distributed best arm identification algorithm for dynamic IoT networks that allow topology changes. Our algorithm is designed under the mobile IoT environment, such that each agent can learn the best option with lightweight computation, limited memory and asynchronous communication. Specifically, two different manners of message passing are provided. Extensive experiments illustrated that our proposed algorithm converges fast and can be implemented well in reality.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is partially supported by National Key R&D Program of China with grant No. 2019YFB2102600 and NSFC, China (No. 61971269, 61832012, 61702304), Shandong Provincial Natural Science Foundation, China (Grant No. ZR2017QF005), Industrial Internet Innovation and Development Project in 2019 of China.

Shuzhen Chen received the B.Sc. degree in 2019 from the School of Computer Science and Technology, Shandong University. She is currently a postgraduate student in Department of Computer Science and Technology, Shandong University. Her research interests include distributed computing, wireless and mobile security.

References (50)

  • SinghJ. et al.

    A survey and taxonomy on energy management schemes in wireless sensor networks

    J. Syst. Archit.

    (2020)
  • ZhangJ. et al.

    On-demand deployment for iot applications

    J. Syst. Archit.

    (2020)
  • WanS. et al.

    Cognitive computing and wireless communications on the edge for healthcare service robots

    Comput. Commun.

    (2020)
  • XuD. et al.

    A survey on edge intelligence

    (2020)
  • Z. Ren, S. Liang, P. Li, S. Wang, M. de Rijke, Social collaborative viewpoint regression with explainable...
  • Z. Ren, M.-H. Peetz, S. Liang, W. Van Dolen, M. De Rijke, Hierarchical multi-label classification of social text...
  • Z. Ren, M. de Rijke, Summarizing contrastive themes via hierarchical non-parametric processes, in: Proceedings of the...
  • YangH. et al.

    Reliable data storage in heterogeneous wireless sensor networks by jointly optimizing routing and storage node deployment

    Tsinghua Sci. Technol.

    (2020)
  • YuD. et al.

    Implementing abstract mac layer in dynamic networks

    IEEE Trans. Mob. Comput.

    (2020)
  • HuaQ.-S. et al.

    Faster parallel core maintenance algorithms in dynamic graphs

    IEEE Trans. Parallel Distrib. Syst.

    (2019)
  • LiK. et al.

    Seed-free graph de-anonymiztiation with adversarial learning

  • RossS.

    Introduction to Probability Models

    (2014)
  • HajekB.

    Random Processes for Engineers

    (2015)
  • Even-DarE. et al.

    PAC bounds for multi-armed bandit and Markov decision processes

  • BubeckS. et al.

    Regret analysis of stochastic and nonstochastic multi-armed bandit problems

    Found. Trends Mach. Learn.

    (2012)
  • ChenS. et al.

    Privacy-preserving collaborative learning for multi-armed bandits in IoT

    IEEE Internet Things J.

    (2020)
  • YuanY. et al.

    Distributed social learning with imperfect information

    IEEE Trans. Netw. Sci. Eng.

    (2020)
  • KohliP. et al.

    A fast bandit algorithm for recommendation to users with heterogenous tastes

  • ChakrabartiD. et al.

    Mortal multi-armed bandits

  • KuleshovV. et al.

    Algorithms for multi-armed bandit problems

    (2014)
  • LiF. et al.

    Multi-armed-bandit-based spectrum scheduling algorithms in wireless networks: A survey

    IEEE Wirel. Commun.

    (2020)
  • AuerP.

    Finite-time analysis of the multiarmed bandit problem

    Mach. Learn.

    (2002)
  • Even-DarE. et al.

    Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

    J. Mach. Learn. Res.

    (2006)
  • AudibertJ. et al.

    Best arm identification in multi-armed bandits

  • AnandkumarA. et al.

    Distributed algorithms for learning and cognitive medium access with logarithmic regret

    IEEE J. Sel. Areas Commun.

    (2011)
  • Cited by (15)

    • Wireless edge caching based on content similarity in dynamic environments

      2021, Journal of Systems Architecture
      Citation Excerpt :

      This observation is also true for the content storage, sharing, and caching. The ubiquitous knowledge acquisition and global information analysis/management empowered by cloud storage and computing is not so promising especially in the fifth-generation (5G) and Internet of Things (IoT) era [1,2], where most data are generated at the edge rather than the network core. In this backdrop, a new computation paradigm, named edge computing, or Multi-access Edge Computing (MEC), in which data is processed where it originates [3], has emerged and achieved dominance by far.

    View all citing articles on Scopus

    Shuzhen Chen received the B.Sc. degree in 2019 from the School of Computer Science and Technology, Shandong University. She is currently a postgraduate student in Department of Computer Science and Technology, Shandong University. Her research interests include distributed computing, wireless and mobile security.

    Youming Tao is currently an undergraduate student in Taishan College, Shandong University, Shandong, China. He currently focuses on fundamental problems in decentralized machine learning algorithm design, distributed computing, mechanism design and privacy-preserving data analytics.

    Dongxiao Yu received the B.Sc. degree in 2006 from the School of Mathematics, Shandong University and the Ph.D. degree in 2014 from the Department of Computer Science, The University of Hong Kong. He became an associate professor in the School of Computer Science and Technology, Huazhong University of Science and Technology, in 2016. He is currently a professor in the School of Computer Science and Technology, Shandong University. His research interests include wireless networks, distributed computing and graph algorithms.

    Feng Li received his BS and MS degrees in Computer Science from Shandong Normal University, China, in 2007, and Shandong University, China, in 2010, respectively. He got his Ph.D. degree (also in Computer Science) from Nanyang Technological University, Singapore, in 2015. From 2014 to 2015, he worked as a research fellow in National University of Singapore, Singapore. He then joined School of Computer Science and Technology, Shandong University, China, where he is currently an associate professor. His research interests include distributed algorithms and systems, wireless networking, mobile computing, and Internet of Things.

    Bei Gong received his B.S. degree from Shandong University in 2005, and the Ph.D. degree from the Beijing University of Technology in 2012. He participates in six National invention patent and one monograph textbooks. In the past five years, he has published more than 30 papers in the first-class SCI/EI and other international famous journals and top international conferences in relevant research fields. His research interests include trusted computing, Internet of things security, mobile Internet of things, mobile edge computing. He has presided over 8 national projects such as the National Natural Science Foundation and 6 provincial and ministerial projects such as the general science and technology program of Beijing Municipal Education Commission. [email protected].

    View full text