当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inhomogeneous deep Q-network for time sensitive applications
Artificial Intelligence ( IF 14.4 ) Pub Date : 2022-07-15 , DOI: 10.1016/j.artint.2022.103757
Xu Chen , Jun Wang

Deep Q-network (DQN) has attracted increasing attention from both industry and academic communities. Existing methods mostly formulate the decision process as discrete agent-environment interactions, while the intervals between successive interactions are largely neglected, which may otherwise reveal important signals in real-world applications. To bridge this gap, this paper proposes to explicitly model the time intervals in DQN. Specifically, we first cast the agent-environment interactions onto a continuous time dimension, and then define a time-aware learning objective and the corresponding Bellman operator. For sample efficient training, we approximate the Q-function with a neural network, where the time information is modeled by the point process. The intensity function in point process and Q-function are seamlessly integrated by sharing the same history summarization module, such that the time interval information can directly influence the model optimization process. To close the gap between the approximated and optimal Q-function, we theoretically analyze the sample complexity of our model by deriving the finite time bound in continuous time. We conduct both simulation and real-world experiments to demonstrate our model's effectiveness.



中文翻译:

用于时间敏感应用的非均匀深度 Q 网络

深度 Q 网络 (DQN) 越来越受到工业界和学术界的关注。现有方法大多将决策过程表述为离散的代理-环境交互,而连续交互之间的间隔在很大程度上被忽略了,否则这可能会揭示现实世界应用中的重要信号。为了弥合这一差距,本文建议对 DQN 中的时间间隔进行显式建模。具体来说,我们首先将代理-环境交互转换为连续时间维度,然后定义时间感知学习目标和相应的贝尔曼算子。对于样本高效训练,我们使用神经网络逼近 Q 函数,其中时间信息由点过程建模。点过程中的强度函数和Q函数通过共享同一个历史汇总模块无缝集成,使得时间区间信息可以直接影响模型优化过程。为了缩小近似和最优 Q 函数之间的差距,我们通过推导连续时间的有限时间界限,从理论上分析我们模型的样本复杂度。我们进行模拟和真实世界的实验来证明我们模型的有效性。

更新日期:2022-07-15
down
wechat
bug