Meta-Reinforcement Learning for Reliable Communication in THz/VLC Wireless VR Networks,IEEE Transactions on Wireless Communications

当前位置： X-MOL 学术 › IEEE Trans. Wirel. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Meta-Reinforcement Learning for Reliable Communication in THz/VLC Wireless VR Networks
IEEE Transactions on Wireless Communications ( IF 8.9 ) Pub Date : 2022-03-31 , DOI: 10.1109/twc.2022.3161970
Yining Wang ₁ , Mingzhe Chen ₂ , Zhaohui Yang ₃ , Walid Saad ₄ , Tao Luo ₁ , Shuguang Cui ₅ , H. Vincent Poor ₂

Affiliation

In this paper, the problem of enhancing the quality of virtual reality (VR) services is studied for an indoor terahertz (THz)/visible light communication (VLC) wireless network. In the studied model, small base stations (SBSs) transmit high-quality VR images to VR users over THz bands and light-emitting diodes (LEDs) provide accurate indoor positioning services for them using VLC. Here, VR users move in real time and their movement patterns change over time according to their applications, where both THz and VLC links can be blocked by the bodies of VR users. To control the energy consumption of the studied THz/VLC wireless VR network, VLC access points (VAPs) must be selectively turned on so as to ensure accurate and extensive positioning for VR users. Based on the user positions, each SBS must generate corresponding VR images and establish THz links without body blockage to transmit the VR content. The problem is formulated as an optimization problem whose goal is to maximize the average number of successfully served VR users by selecting the appropriate VAPs to be turned on and controlling the user association with SBSs. To solve this problem, a policy gradient-based reinforcement learning (RL) algorithm that adopts a meta-learning approach is proposed. The proposed meta policy gradient (MPG) algorithm enables the trained policy to quickly adapt to new user movement patterns. In order to solve the problem of maximizing the average number of successfully served users for VR scenarios with large numbers of users, a low-complexity dual method based MPG algorithm (D-MPG) with a low complexity is proposed. Simulation results demonstrate that, compared to a baseline trust region policy optimization algorithm (TRPO), the proposed MPG and D-MPG algorithms yield up to 26.8% and 21.9% improvement in the average number of successfully served users as well as 81.2% and 87.5% gains in the convergence speed, respectively.

中文翻译：

THz/VLC 无线 VR 网络中可靠通信的元强化学习

本文研究了室内太赫兹（THz）/可见光通信（VLC）无线网络增强虚拟现实（VR）服务质量的问题。在研究的模型中，小型基站（SBS）通过太赫兹频段向 VR 用户传输高质量的 VR 图像，发光二极管（LED）使用 VLC 为他们提供准确的室内定位服务。在这里，VR 用户实时移动，他们的移动模式根据他们的应用程序随时间而变化，其中 THz 和 VLC 链路都可能被 VR 用户的身体阻挡。为了控制所研究的THz/VLC无线VR网络的能耗，必须选择性地打开VLC接入点（VAP），以确保VR用户的准确和广泛的定位。每个SBS必须根据用户位置生成相应的VR图像，并建立无人体遮挡的太赫兹链路来传输VR内容。该问题被表述为一个优化问题，其目标是通过选择要打开的适当 VAP 并控制用户与 SBS 的关联来最大化成功服务的 VR 用户的平均数量。为了解决这个问题，提出了一种采用元学习方法的基于策略梯度的强化学习（RL）算法。所提出的元策略梯度（MPG）算法使训练后的策略能够快速适应新的用户移动模式。为了解决大量用户的VR场景下最大化平均成功服务用户数的问题，提出一种低复杂度的基于双方法的MPG算法（D-MPG）。仿真结果表明，与基线信任域策略优化算法 (TRPO) 相比，所提出的 MPG 和 D-MPG 算法的产量高达 26。成功服务的平均用户数分别提高了 8% 和 21.9%，收敛速度分别提高了 81.2% 和 87.5%。

更新日期：2022-03-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11