MMDP: A Mobile-IoT Based Multi-modal Reinforcement Learning Service Framework,IEEE Transactions on Services Computing

当前位置： X-MOL 学术 › IEEE Trans. Serv. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MMDP: A Mobile-IoT Based Multi-modal Reinforcement Learning Service Framework
IEEE Transactions on Services Computing ( IF 5.5 ) Pub Date : 2020-07-01 , DOI: 10.1109/tsc.2020.2964663
Puming Wang , Laurence T. Yang , Jintao Li , Xue Li , Xiaokang Zhou

With the development of GPS technology, a new Mobile Internet of Things (M-IoT) is emerging, which perceives the city's rhythm and pulse day and night to collect a large scale of city data. It is urgent to innovate M-IoT service system for these large-scale and heterogeneous data. To cope with the problem, this article proposes a Mobile-IoT based multi-modal reinforcement learning service framework from data perspective, which has three highlights, i) Developing Action-aware High-order Transition Tensor (

$AHTT$

AHTT

) to fuse the heterogeneous data from M-IoTs in a unified form. ii) Developing Multi-modal Markov Decision Process (

$MMDP$

MMDP

) to model the multi-modal reinforcement learning for M-IoT service framework. iii) Developing Tensor Policy Iteration algorithm (

$TPIA$

TPIA

) to solve the optimal tensor policy. Due to using tensor keeps the multi-modal relations of the context information in the process of solving the optimal policy. The proposed M-IoT service system provides more personalized service for taxi drivers. The experiment results shows that most taxi drivers earn more revenue according to the tensor policy.

中文翻译：

MMDP：基于移动物联网的多模式强化学习服务框架

随着GPS技术的发展，一种新的移动物联网（M-IoT）正在兴起，它日夜感知城市的节奏和脉搏，收集大量的城市数据。针对这些海量异构数据，迫切需要创新M-IoT服务体系。针对这个问题，本文从数据的角度提出了一个基于Mobile-IoT的多模态强化学习服务框架，它有三个亮点，i）开发Action-aware High-order Transition Tensor（

$AHTT$

一种H吨吨

) 以统一的形式融合来自 M-IoT 的异构数据。ii) 开发多模态马尔可夫决策过程（

$MMDP$

米米D磷

) 为 M-IoT 服务框架的多模态强化学习建模。iii) 开发张量策略迭代算法（

$TPIA$

吨磷一世一种

) 来求解最优张量策略。由于使用张量在求解最优策略的过程中保持了上下文信息的多模态关系。拟议的 M-IoT 服务系统为出租车司机提供更加个性化的服务。实验结果表明，大多数出租车司机根据张量策略赚取更多收入。

更新日期：2020-07-01

点击分享查看原文

点击收藏