Distributed learning dynamics of Multi-Armed Bandits for edge intelligence
Introduction
Recent years have witnessed the prosperity of internet of things (IoT) due to the rapid advancement of technology of chip integration, wireless communication, big data and artificial intelligence, etc. IoT devices, which distributed at the network edge, make data explored efficiently and facilitate the data aggregation through local computation and communication with each other. With these merits, IoT seems to be a good fit for implementing large scaled machine learning algorithms and developing edge intelligence, which in turn helps to complete some tasks in IoT such as decision making, underlying environment learning, etc. Hence the learning paradigm shifts from centralized cloud computing to edge computing (fog computing) recently which results in edge intelligence. Edge intelligence refers to a set of connected edge devices (e.g., IoT devices, smartphones, UAVs) that is used for data collection, caching, processing, and analysis based on artificial intelligence [1]. However, traditional machine learning algorithms are typically demanding in computing resources [2], [3], e.g. CPU, GPU, and memory, which contrasts the lightweight computation capacity, limited memory and mobility of edge devices. To push the computation tasks from the cloud server to the edge, we have to devise lightweight learning algorithms [4], [5], [6] for the edge (e.g. IoT networks).
Multi-agent decision making is a common task in edge intelligence. We employ distributed multi-armed bandits (MAB) to formalize our multi-agent decision-making problem, where multiple networked agents face a set of arms (options) and each arm is represented by a stochastic reward with a mean unknown to the agent. The agents’ goal is to learn the best arm/option collaboratively by exchanging local states or opinions between each other iteratively until reaching a consensus. In fact, distributed MAB algorithm for large scaled networks has been studied extensively (see Section 2). Unfortunately, most of these works demand long-time stable networks connected by powerful devices, which are not suitable for edge intelligence. A natural question comes that whether an efficient and robust collaborative learning algorithm can be devised such that it can overcome constraints of IoT devices and be implemented even in mobile IoT networks.
To settle this problem, we present in this paper a collaborative learning algorithm for dynamic IoT networks. In particular, the network is modeled as a connected network whose topology changes arbitrarily over time on a fixed set of agents [7], [8], [9], [10]. For the sake of being suitable for IoT, we also impose some necessary constraints on the system model. Firstly, we assume that each agent has only bounded memory such that no learning history can be stored. Secondly, the bandwidth of communication channel is limited such that each agent can transmit at most ( denotes the number of arms) bits per communication. Lastly, the time is assumed to be continuous and agents take actions asynchronously by equipping each agent an independent Poisson clock with a common parameter. The Poisson clock model is natural for modeling swarm intelligent systems that are not fully synchronized [11], [12].
Under the dynamic network model we mentioned above, we propose the distributed collaborative learning algorithm that can be implemented in dynamic networks under the IoT constraints. With detailed and rigorous analysis, it can be assured with a high probability that every agent pulls only the best arm eventually. We also conducted extensive experiments to confirm the efficacy of our proposed distributed learning algorithm. The results show that the convergence of our learning algorithm is quick and hence the algorithm is robust and efficient in real settings.
The remainder of this paper is organized as follows. Section 2 introduces the related works. Section 3 presents our system model. Section 4 and Section 5 show the learning algorithm. We give the detailed analysis in Section 6. Section 7 shows the practical performance evaluation. Finally we conclude our paper in Section 8.
Section snippets
Related work
In recent years, multi-armed bandit (MAB) algorithms have received extensive attention and in-depth research, given that MAB settings are not only challenging in theory [13], [14], [15], [16] but also useful in practice [17], [18], [19], [20]. The basic MAB setting is concerned with learning the priori unknown rewards of a single optimal arm in a group of candidate arms by trying one arm in turn, and observing the rewards [21]. In the centralized setting, there are two main research directions
Model
We consider a dynamic network consisting of agents (also called nodes). The communication model is represented by an undirected graph . The agents exchange messages with their neighbors in the graph , where and is the set of edges. Let denote the neighbors of agent . The topology of the communication network can change arbitrarily over time. More specifically, edges in the graph is constantly changing based on a fixed set of agents. The neighbors of
Algorithm
In order to identify the best arm efficiently, each agent collaborates with its neighbors. The two-step distributed learning dynamics consisting of sampling and adopting have been widely adopted in the literature [48], [49]. But in previous works, the convergence of this learning process highly relies on strong constraints on the network topology, which makes it hard to implement in general scenarios, let alone dynamic settings. We here adapt the two-step learning process into a three-step
Generating the selection matrix
In this section, we introduce how to produce the probability selection matrix using Maximum-Degree matrix and Metropolis–Hastings matrix. The two matrixes are traditionally proposed in analyzing Markov chains, such as in [50]. The MD and MH approaches can be implemented in difference scenarios. Specifically, in the MD approach, each agent needs to know the maximum degree and its own degree, while in the MH approach, only the node’s own degree and its neighbors’ degrees are needed. We will
Analysis
In this section, we analyze the convergence of our proposed collaborative learning process.
Experiments
In this section, we analyze the performance through the simulation experiments. The impact of the parameters on the performance of the learning algorithm, including the number of arms and the number of agents , are analyzed. The two approached of setting up the probability selection matrix , Maximum-Degree Algorithm (MD) and Metropolis–Hastings Algorithm (MH), are compared in the dynamic setting.
We use the proportion of edges that change with time in the communication model to define the
Conclusion
In this paper, we presented a distributed best arm identification algorithm for dynamic IoT networks that allow topology changes. Our algorithm is designed under the mobile IoT environment, such that each agent can learn the best option with lightweight computation, limited memory and asynchronous communication. Specifically, two different manners of message passing are provided. Extensive experiments illustrated that our proposed algorithm converges fast and can be implemented well in reality.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is partially supported by National Key R&D Program of China with grant No. 2019YFB2102600 and NSFC, China (No. 61971269, 61832012, 61702304), Shandong Provincial Natural Science Foundation, China (Grant No. ZR2017QF005), Industrial Internet Innovation and Development Project in 2019 of China.
Shuzhen Chen received the B.Sc. degree in 2019 from the School of Computer Science and Technology, Shandong University. She is currently a postgraduate student in Department of Computer Science and Technology, Shandong University. Her research interests include distributed computing, wireless and mobile security.
References (50)
- et al.
A survey and taxonomy on energy management schemes in wireless sensor networks
J. Syst. Archit.
(2020) - et al.
On-demand deployment for iot applications
J. Syst. Archit.
(2020) - et al.
Cognitive computing and wireless communications on the edge for healthcare service robots
Comput. Commun.
(2020) - et al.
A survey on edge intelligence
(2020) - Z. Ren, S. Liang, P. Li, S. Wang, M. de Rijke, Social collaborative viewpoint regression with explainable...
- Z. Ren, M.-H. Peetz, S. Liang, W. Van Dolen, M. De Rijke, Hierarchical multi-label classification of social text...
- Z. Ren, M. de Rijke, Summarizing contrastive themes via hierarchical non-parametric processes, in: Proceedings of the...
- et al.
Reliable data storage in heterogeneous wireless sensor networks by jointly optimizing routing and storage node deployment
Tsinghua Sci. Technol.
(2020) - et al.
Implementing abstract mac layer in dynamic networks
IEEE Trans. Mob. Comput.
(2020) - et al.
Faster parallel core maintenance algorithms in dynamic graphs
IEEE Trans. Parallel Distrib. Syst.
(2019)
Seed-free graph de-anonymiztiation with adversarial learning
Introduction to Probability Models
Random Processes for Engineers
PAC bounds for multi-armed bandit and Markov decision processes
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Found. Trends Mach. Learn.
Privacy-preserving collaborative learning for multi-armed bandits in IoT
IEEE Internet Things J.
Distributed social learning with imperfect information
IEEE Trans. Netw. Sci. Eng.
A fast bandit algorithm for recommendation to users with heterogenous tastes
Mortal multi-armed bandits
Algorithms for multi-armed bandit problems
Multi-armed-bandit-based spectrum scheduling algorithms in wireless networks: A survey
IEEE Wirel. Commun.
Finite-time analysis of the multiarmed bandit problem
Mach. Learn.
Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
J. Mach. Learn. Res.
Best arm identification in multi-armed bandits
Distributed algorithms for learning and cognitive medium access with logarithmic regret
IEEE J. Sel. Areas Commun.
Cited by (15)
Editorial to special issue on resource management for edge intelligence
2021, Journal of Systems ArchitectureWireless edge caching based on content similarity in dynamic environments
2021, Journal of Systems ArchitectureCitation Excerpt :This observation is also true for the content storage, sharing, and caching. The ubiquitous knowledge acquisition and global information analysis/management empowered by cloud storage and computing is not so promising especially in the fifth-generation (5G) and Internet of Things (IoT) era [1,2], where most data are generated at the edge rather than the network core. In this backdrop, a new computation paradigm, named edge computing, or Multi-access Edge Computing (MEC), in which data is processed where it originates [3], has emerged and achieved dominance by far.
Univariate Time Series Anomaly Detection Based on Hierarchical Attention Network
2024, Tsinghua Science and TechnologyTime-Constrained Ensemble Sensing With Heterogeneous IoT Devices in Intelligent Transportation Systems
2023, IEEE Transactions on Intelligent Transportation SystemsExplainable AI Over the Internet of Things (IoT): Overview, State-of-the-Art and Future Directions
2022, IEEE Open Journal of the Communications Society
Shuzhen Chen received the B.Sc. degree in 2019 from the School of Computer Science and Technology, Shandong University. She is currently a postgraduate student in Department of Computer Science and Technology, Shandong University. Her research interests include distributed computing, wireless and mobile security.
Youming Tao is currently an undergraduate student in Taishan College, Shandong University, Shandong, China. He currently focuses on fundamental problems in decentralized machine learning algorithm design, distributed computing, mechanism design and privacy-preserving data analytics.
Dongxiao Yu received the B.Sc. degree in 2006 from the School of Mathematics, Shandong University and the Ph.D. degree in 2014 from the Department of Computer Science, The University of Hong Kong. He became an associate professor in the School of Computer Science and Technology, Huazhong University of Science and Technology, in 2016. He is currently a professor in the School of Computer Science and Technology, Shandong University. His research interests include wireless networks, distributed computing and graph algorithms.
Feng Li received his BS and MS degrees in Computer Science from Shandong Normal University, China, in 2007, and Shandong University, China, in 2010, respectively. He got his Ph.D. degree (also in Computer Science) from Nanyang Technological University, Singapore, in 2015. From 2014 to 2015, he worked as a research fellow in National University of Singapore, Singapore. He then joined School of Computer Science and Technology, Shandong University, China, where he is currently an associate professor. His research interests include distributed algorithms and systems, wireless networking, mobile computing, and Internet of Things.
Bei Gong received his B.S. degree from Shandong University in 2005, and the Ph.D. degree from the Beijing University of Technology in 2012. He participates in six National invention patent and one monograph textbooks. In the past five years, he has published more than 30 papers in the first-class SCI/EI and other international famous journals and top international conferences in relevant research fields. His research interests include trusted computing, Internet of things security, mobile Internet of things, mobile edge computing. He has presided over 8 national projects such as the National Natural Science Foundation and 6 provincial and ministerial projects such as the general science and technology program of Beijing Municipal Education Commission. [email protected].