当前位置: X-MOL 学术IEEE Trans. Cognit. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast Q-learning for Improved Finite Length Performance of Irregular Repetition Slotted ALOHA
IEEE Transactions on Cognitive Communications and Networking ( IF 8.6 ) Pub Date : 2020-06-01 , DOI: 10.1109/tccn.2019.2957224
Eleni Nisioti , Nikolaos Thomos

In this paper, we study the problem of designing adaptive Medium Access Control (MAC) solutions for wireless sensor networks (WSNs) under the Irregular Repetition Slotted ALOHA (IRSA) protocol. In particular, we optimize the degree distribution employed by IRSA for finite frame sizes. Motivated by characteristics of WSNs, such as the restricted computational resources and partial observability, we model the design of IRSA as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP). We have theoretically analyzed our solution in terms of optimality of the learned IRSA design and derived guarantees for finding near-optimal policies. These guarantees are generic and can be applied in resource allocation problems that exhibit the waterfall effect, which in our setting manifests itself as a severe degradation in the overall throughput of the network above a particular channel load. Furthermore, we combat the inherent non-stationarity of the learning environment in WSNs by advancing classical Q-learning through the use of virtual experience (VE), a technique that enables the update of multiple state-action pairs per learning iteration and, thus, accelerates convergence. Our simulations confirm the superiority of our learning-based MAC solution compared to traditional IRSA and provide insights into the effect of WSN characteristics on the quality of learned policies.

中文翻译:

用于改进不规则重复开槽 ALOHA 的有限长度性能的快速 Q 学习

在本文中,我们研究了在不规则重复时隙 ALOHA (IRSA) 协议下为无线传感器网络 (WSN) 设计自适应媒体访问控制 (MAC) 解决方案的问题。特别是,我们针对有限的帧大小优化了 IRSA 采用的度数分布。受 WSN 的特性(例如受限的计算资源和部分可观察性)的启发,我们将 IRSA 的设计建模为分散式部分可观察马尔可夫决策过程(Dec-POMDP)。我们已经在学习的 IRSA 设计的最优性方面对我们的解决方案进行了理论上的分析,并为寻找接近最优的策略提供了保证。这些保证是通用的,可以应用于表现出瀑布效应的资源分配问题,在我们的设置中,这表现为网络整体吞吐量在特定信道负载之上的严重下降。此外,我们通过使用虚拟体验 (VE) 推进经典 Q-learning 来对抗 WSN 中学习环境的固有非平稳性,这种技术能够在每次学习迭代中更新多个状态-动作对,因此,加速收敛。我们的模拟证实了我们基于学习的 MAC 解决方案与传统 IRSA 相比的优越性,并提供了对 WSN 特性对学习策略质量影响的见解。一种技术,可以在每次学习迭代中更新多个状态-动作对,从而加速收敛。我们的模拟证实了我们基于学习的 MAC 解决方案与传统 IRSA 相比的优越性,并提供了对 WSN 特性对学习策略质量影响的见解。一种技术,可以在每次学习迭代中更新多个状态-动作对,从而加速收敛。我们的模拟证实了我们基于学习的 MAC 解决方案与传统 IRSA 相比的优越性,并提供了对 WSN 特性对学习策略质量影响的见解。
更新日期:2020-06-01
down
wechat
bug