Distributed learning dynamics of Multi-Armed Bandits for edge intelligence

doi:10.1016/j.sysarc.2020.101919

Journal of Systems Architecture

Volume 114, March 2021, 101919

https://doi.org/10.1016/j.sysarc.2020.101919 Get rights and content

Abstract

Multi-agent decision making is a fundamental problem in edge intelligence. In this paper, we study this problem for IoT networks under the distributed Multi-Armed Bandits (MAB) model. Most of existing works for distributed MAB demand long-time stable networks connected by powerful devices and hence may not be suitable for mobile IoT networks with harsh IoT constraints. To meet the challenge of resource constraints in mobile IoT environment, we propose a lightweight and robust learning algorithm in a dynamic network allowing topology changes. In our model, each agent is assumed to have only limited memory and communicate with each other asynchronously. Moreover, we assume that the bandwidth for exchanging information is limited and each agent can transmit $O ({log}_{2} K)$ bits ( $K$ denotes the number of arms) per communication. Rigorous analysis shows that despite these harsh constraints, the best arm/option can be identified collaboratively by the agents and the algorithm converges efficiently. Extensive experiments illustrate that the proposed algorithm exhibits good efficiency and stability in mobile settings.

Introduction

Recent years have witnessed the prosperity of internet of things (IoT) due to the rapid advancement of technology of chip integration, wireless communication, big data and artificial intelligence, etc. IoT devices, which distributed at the network edge, make data explored efficiently and facilitate the data aggregation through local computation and communication with each other. With these merits, IoT seems to be a good fit for implementing large scaled machine learning algorithms and developing edge intelligence, which in turn helps to complete some tasks in IoT such as decision making, underlying environment learning, etc. Hence the learning paradigm shifts from centralized cloud computing to edge computing (fog computing) recently which results in edge intelligence. Edge intelligence refers to a set of connected edge devices (e.g., IoT devices, smartphones, UAVs) that is used for data collection, caching, processing, and analysis based on artificial intelligence [1]. However, traditional machine learning algorithms are typically demanding in computing resources [2], [3], e.g. CPU, GPU, and memory, which contrasts the lightweight computation capacity, limited memory and mobility of edge devices. To push the computation tasks from the cloud server to the edge, we have to devise lightweight learning algorithms [4], [5], [6] for the edge (e.g. IoT networks).

Multi-agent decision making is a common task in edge intelligence. We employ distributed multi-armed bandits (MAB) to formalize our multi-agent decision-making problem, where multiple networked agents face a set of arms (options) and each arm is represented by a stochastic reward with a mean unknown to the agent. The agents’ goal is to learn the best arm/option collaboratively by exchanging local states or opinions between each other iteratively until reaching a consensus. In fact, distributed MAB algorithm for large scaled networks has been studied extensively (see Section 2). Unfortunately, most of these works demand long-time stable networks connected by powerful devices, which are not suitable for edge intelligence. A natural question comes that whether an efficient and robust collaborative learning algorithm can be devised such that it can overcome constraints of IoT devices and be implemented even in mobile IoT networks.

To settle this problem, we present in this paper a collaborative learning algorithm for dynamic IoT networks. In particular, the network is modeled as a connected network whose topology changes arbitrarily over time on a fixed set of agents [7], [8], [9], [10]. For the sake of being suitable for IoT, we also impose some necessary constraints on the system model. Firstly, we assume that each agent has only bounded memory such that no learning history can be stored. Secondly, the bandwidth of communication channel is limited such that each agent can transmit at most ${log}_{2} K$ ( $K$ denotes the number of arms) bits per communication. Lastly, the time is assumed to be continuous and agents take actions asynchronously by equipping each agent an independent Poisson clock with a common parameter. The Poisson clock model is natural for modeling swarm intelligent systems that are not fully synchronized [11], [12].

Under the dynamic network model we mentioned above, we propose the distributed collaborative learning algorithm that can be implemented in dynamic networks under the IoT constraints. With detailed and rigorous analysis, it can be assured with a high probability that every agent pulls only the best arm eventually. We also conducted extensive experiments to confirm the efficacy of our proposed distributed learning algorithm. The results show that the convergence of our learning algorithm is quick and hence the algorithm is robust and efficient in real settings.

The remainder of this paper is organized as follows. Section 2 introduces the related works. Section 3 presents our system model. Section 4 and Section 5 show the learning algorithm. We give the detailed analysis in Section 6. Section 7 shows the practical performance evaluation. Finally we conclude our paper in Section 8.

Section snippets

Related work

In recent years, multi-armed bandit (MAB) algorithms have received extensive attention and in-depth research, given that MAB settings are not only challenging in theory [13], [14], [15], [16] but also useful in practice [17], [18], [19], [20]. The basic MAB setting is concerned with learning the priori unknown rewards of a single optimal arm in a group of candidate arms by trying one arm in turn, and observing the rewards [21]. In the centralized setting, there are two main research directions

Model

We consider a dynamic network consisting of $N$ agents (also called nodes). The communication model is represented by an undirected graph $G = (N, E)$ . The agents exchange messages with their neighbors in the graph $G$ , where $N = {1, \dots, N}$ and $E$ is the set of edges. Let $V_{i} = {i^{'} \in N ∣ (i, i^{'}) \in E}$ denote the neighbors of agent $i$ . The topology of the communication network can change arbitrarily over time. More specifically, edges in the graph is constantly changing based on a fixed set of agents. The neighbors of

Algorithm

In order to identify the best arm $a_{1}$ efficiently, each agent collaborates with its neighbors. The two-step distributed learning dynamics consisting of sampling and adopting have been widely adopted in the literature [48], [49]. But in previous works, the convergence of this learning process highly relies on strong constraints on the network topology, which makes it hard to implement in general scenarios, let alone dynamic settings. We here adapt the two-step learning process into a three-step

Generating the selection matrix

In this section, we introduce how to produce the probability selection matrix $P$ using Maximum-Degree matrix and Metropolis–Hastings matrix. The two matrixes are traditionally proposed in analyzing Markov chains, such as in [50]. The MD and MH approaches can be implemented in difference scenarios. Specifically, in the MD approach, each agent needs to know the maximum degree and its own degree, while in the MH approach, only the node’s own degree and its neighbors’ degrees are needed. We will

Analysis

In this section, we analyze the convergence of our proposed collaborative learning process.

Experiments

In this section, we analyze the performance through the simulation experiments. The impact of the parameters on the performance of the learning algorithm, including the number of arms $M$ and the number of agents $N$ , are analyzed. The two approached of setting up the probability selection matrix $P$ , Maximum-Degree Algorithm (MD) and Metropolis–Hastings Algorithm (MH), are compared in the dynamic setting.

We use the proportion of edges that change with time in the communication model to define the

Conclusion

In this paper, we presented a distributed best arm identification algorithm for dynamic IoT networks that allow topology changes. Our algorithm is designed under the mobile IoT environment, such that each agent can learn the best option with lightweight computation, limited memory and asynchronous communication. Specifically, two different manners of message passing are provided. Extensive experiments illustrated that our proposed algorithm converges fast and can be implemented well in reality.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is partially supported by National Key R&D Program of China with grant No. 2019YFB2102600 and NSFC, China (No. 61971269, 61832012, 61702304), Shandong Provincial Natural Science Foundation, China (Grant No. ZR2017QF005), Industrial Internet Innovation and Development Project in 2019 of China.

Shuzhen Chen received the B.Sc. degree in 2019 from the School of Computer Science and Technology, Shandong University. She is currently a postgraduate student in Department of Computer Science and Technology, Shandong University. Her research interests include distributed computing, wireless and mobile security.

References (50)

SinghJ. et al.
A survey and taxonomy on energy management schemes in wireless sensor networks
J. Syst. Archit.
(2020)
ZhangJ. et al.
On-demand deployment for iot applications
J. Syst. Archit.
(2020)
WanS. et al.
Cognitive computing and wireless communications on the edge for healthcare service robots
Comput. Commun.
(2020)
XuD. et al.
A survey on edge intelligence
(2020)
Z. Ren, S. Liang, P. Li, S. Wang, M. de Rijke, Social collaborative viewpoint regression with explainable...
Z. Ren, M.-H. Peetz, S. Liang, W. Van Dolen, M. De Rijke, Hierarchical multi-label classification of social text...
Z. Ren, M. de Rijke, Summarizing contrastive themes via hierarchical non-parametric processes, in: Proceedings of the...
YangH. et al.
Reliable data storage in heterogeneous wireless sensor networks by jointly optimizing routing and storage node deployment
Tsinghua Sci. Technol.
(2020)
YuD. et al.
Implementing abstract mac layer in dynamic networks
IEEE Trans. Mob. Comput.
(2020)
HuaQ.-S. et al.
Faster parallel core maintenance algorithms in dynamic graphs
IEEE Trans. Parallel Distrib. Syst.
(2019)

LiK. et al.

Seed-free graph de-anonymiztiation with adversarial learning

RossS.

Introduction to Probability Models

(2014)

HajekB.

Random Processes for Engineers

(2015)

Even-DarE. et al.

PAC bounds for multi-armed bandit and Markov decision processes

BubeckS. et al.

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

Found. Trends Mach. Learn.

(2012)

ChenS. et al.

Privacy-preserving collaborative learning for multi-armed bandits in IoT

IEEE Internet Things J.

(2020)

YuanY. et al.

Distributed social learning with imperfect information

IEEE Trans. Netw. Sci. Eng.

(2020)

KohliP. et al.

A fast bandit algorithm for recommendation to users with heterogenous tastes

ChakrabartiD. et al.

Mortal multi-armed bandits

KuleshovV. et al.

Algorithms for multi-armed bandit problems

(2014)

LiF. et al.

Multi-armed-bandit-based spectrum scheduling algorithms in wireless networks: A survey

IEEE Wirel. Commun.

(2020)

AuerP.

Finite-time analysis of the multiarmed bandit problem

Mach. Learn.

(2002)

Even-DarE. et al.

Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

J. Mach. Learn. Res.

(2006)

AudibertJ. et al.

Best arm identification in multi-armed bandits

AnandkumarA. et al.

Distributed algorithms for learning and cognitive medium access with logarithmic regret

IEEE J. Sel. Areas Commun.

(2011)

Cited by (15)

Editorial to special issue on resource management for edge intelligence
2021, Journal of Systems Architecture
Wireless edge caching based on content similarity in dynamic environments
2021, Journal of Systems Architecture
Citation Excerpt :
This observation is also true for the content storage, sharing, and caching. The ubiquitous knowledge acquisition and global information analysis/management empowered by cloud storage and computing is not so promising especially in the fifth-generation (5G) and Internet of Things (IoT) era [1,2], where most data are generated at the edge rather than the network core. In this backdrop, a new computation paradigm, named edge computing, or Multi-access Edge Computing (MEC), in which data is processed where it originates [3], has emerged and achieved dominance by far.
Edge caching could greatly relieve the burden of the backbone network and reduce the content request latency experienced by end-user devices. This makes edge caching a promising technology for enabling data-intensive and latency-sensitive applications on the eve of the large-scale commercial operation of 5G. However, the slow-start phenomenon incurred by existing request history-based caching strategies limits the performance of wireless edge caching, especially in the dynamic scenario where both mobile devices and contents arrive and leave periodically. On the other hand, it is also a hard task for deep reinforcement learning-based methods to adapt to the dynamics of the environment. In this backdrop, a new caching algorithm, called Similarity-Aware Popularity-based Caching (SAPoC), is presented in this paper to promote the performance of wireless edge caching in dynamic scenarios through utilizing the similarity among contents. In SAPoC algorithm, a content’s popularity is determined by not only its requests history but also its similarity with existing popular ones to enable a quick-start of newly arrived contents. A series of simulation experiments are conducted to evaluate SAPoC algorithm’s performance. Results have shown that SAPoC outperforms several typical proposals in both cache hit ratio and energy consumption.
Univariate Time Series Anomaly Detection Based on Hierarchical Attention Network
2024, Tsinghua Science and Technology
Time-Constrained Ensemble Sensing With Heterogeneous IoT Devices in Intelligent Transportation Systems
2023, IEEE Transactions on Intelligent Transportation Systems
Explainable AI over the Internet of Things (IoT): Overview, State-of-the-Art and Future Directions
2022, arXiv
Explainable AI Over the Internet of Things (IoT): Overview, State-of-the-Art and Future Directions
2022, IEEE Open Journal of the Communications Society

View all citing articles on Scopus

Youming Tao is currently an undergraduate student in Taishan College, Shandong University, Shandong, China. He currently focuses on fundamental problems in decentralized machine learning algorithm design, distributed computing, mechanism design and privacy-preserving data analytics.

Dongxiao Yu received the B.Sc. degree in 2006 from the School of Mathematics, Shandong University and the Ph.D. degree in 2014 from the Department of Computer Science, The University of Hong Kong. He became an associate professor in the School of Computer Science and Technology, Huazhong University of Science and Technology, in 2016. He is currently a professor in the School of Computer Science and Technology, Shandong University. His research interests include wireless networks, distributed computing and graph algorithms.

Feng Li received his BS and MS degrees in Computer Science from Shandong Normal University, China, in 2007, and Shandong University, China, in 2010, respectively. He got his Ph.D. degree (also in Computer Science) from Nanyang Technological University, Singapore, in 2015. From 2014 to 2015, he worked as a research fellow in National University of Singapore, Singapore. He then joined School of Computer Science and Technology, Shandong University, China, where he is currently an associate professor. His research interests include distributed algorithms and systems, wireless networking, mobile computing, and Internet of Things.

Bei Gong received his B.S. degree from Shandong University in 2005, and the Ph.D. degree from the Beijing University of Technology in 2012. He participates in six National invention patent and one monograph textbooks. In the past five years, he has published more than 30 papers in the first-class SCI/EI and other international famous journals and top international conferences in relevant research fields. His research interests include trusted computing, Internet of things security, mobile Internet of things, mobile edge computing. He has presided over 8 national projects such as the National Natural Science Foundation and 6 provincial and ministerial projects such as the general science and technology program of Beijing Municipal Education Commission. [email protected].

View full text

Distributed learning dynamics of Multi-Armed Bandits for edge intelligence

Abstract

Introduction

Section snippets

Related work

Model

Algorithm

Generating the selection matrix

Analysis

Experiments

Conclusion

Declaration of Competing Interest

Acknowledgments

J. Syst. Archit.

J. Syst. Archit.

Comput. Commun.

A survey on edge intelligence

Reliable data storage in heterogeneous wireless sensor networks by jointly optimizing routing and storage node deployment

Tsinghua Sci. Technol.

Implementing abstract mac layer in dynamic networks

IEEE Trans. Mob. Comput.

Faster parallel core maintenance algorithms in dynamic graphs

IEEE Trans. Parallel Distrib. Syst.

Seed-free graph de-anonymiztiation with adversarial learning

Introduction to Probability Models

Random Processes for Engineers

PAC bounds for multi-armed bandit and Markov decision processes

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

Found. Trends Mach. Learn.

Privacy-preserving collaborative learning for multi-armed bandits in IoT

IEEE Internet Things J.

Distributed social learning with imperfect information

IEEE Trans. Netw. Sci. Eng.

A fast bandit algorithm for recommendation to users with heterogenous tastes

Mortal multi-armed bandits

Algorithms for multi-armed bandit problems

Multi-armed-bandit-based spectrum scheduling algorithms in wireless networks: A survey

IEEE Wirel. Commun.

Finite-time analysis of the multiarmed bandit problem

Mach. Learn.

Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

J. Mach. Learn. Res.

Best arm identification in multi-armed bandits

Distributed algorithms for learning and cognitive medium access with logarithmic regret

IEEE J. Sel. Areas Commun.