当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning-based Demand Response for Privacy-Preserving Users
IEEE Transactions on Industrial Informatics ( IF 12.3 ) Pub Date : 2019-09-01 , DOI: 10.1109/tii.2019.2898462
Amir Ghasemkhani , Lei Yang , Junshan Zhang

Demand response (DR), as a vital component of smart grid, plays an important role in shaping the load profiles in order to improve system reliability and efficiency. Incentive-based DR has been used in many DR programs by incentivizing customers to adapt their loads to supply availability. Note that users’ behavior patterns can be easily identified from fine-grained power consumption when interacting with the load serving entity (LSE), giving rise to serious privacy concerns. One common approach to address the privacy threats is to incorporate perturbations in users’ load measurements. Although it can protect the users’ privacy, yet the usage data modification would degrade the LSE's performance in achieving an optimal incentive strategy due to unknown characteristics of the augmented perturbations. In this paper, we cast the incentive-based DR problem as a stochastic Stackelberg game. To tackle the challenge induced by users’ privacy protection behaviors, we propose a two-timescale reinforcement learning algorithm to learn the optimal incentive strategy under users’ perturbed responses. The proposed algorithm computes the expected utility cost to mitigate the impacts of the random characteristics of the augmented perturbations and then updates the incentive strategy based on the perceived expected utility costs. We derive the conditions under which the proposed incentive scheme converges almost surely to an $\epsilon$-optimal strategy. The efficacy of the proposed algorithm is demonstrated using extensive numerical simulation using real data.

中文翻译:

基于学习的隐私保护用户需求响应

需求响应(DR)作为智能电网的重要组成部分,在调整负载曲线以提高系统可靠性和效率方面起着重要作用。通过激励客户使其负载适应供应可用性,基于激励的灾难恢复已被用于许多灾难恢复计划中。请注意,在与负载服务实体(LSE)进行交互时,可以从细粒度的功耗中轻松识别用户的行为模式,从而引起严重的隐私问题。解决隐私威胁的一种常用方法是将干扰纳入用户的负载测量中。尽管它可以保护用户的隐私,但是由于增加的扰动的未知特性,使用数据修改会降低LSE在实现最佳激励策略时的性能。在本文中,我们将基于激励的灾难恢复问题视为随机的Stackelberg游戏。为了解决用户隐私保护行为所带来的挑战,我们提出了一种两时尺度强化学习算法,以学习在用户扰动响应下的最优激励策略。所提出的算法计算期望效用成本以减轻增强扰动的随机特性的影响,然后基于感知到的期望效用成本更新激励策略。我们得出了在一定条件下拟议的激励计划几乎可以肯定地收敛到一个\ epsilon $最优策略的情况。通过使用真实数据进行广泛的数值模拟,证明了该算法的有效性。我们提出了一种两级尺度的强化学习算法,以学习在用户扰动响应下的最优激励策略。所提出的算法计算期望效用成本以减轻增强扰动的随机特性的影响,然后基于感知到的期望效用成本更新激励策略。我们得出了在一定条件下拟议的激励计划几乎可以肯定地收敛到一个\ epsilon $最优策略的情况。通过使用真实数据进行广泛的数值模拟,证明了该算法的有效性。我们提出了一种两级尺度的强化学习算法,以学习在用户扰动响应下的最优激励策略。所提出的算法计算期望效用成本以减轻增强扰动的随机特性的影响,然后基于感知到的期望效用成本更新激励策略。我们得出了在一定条件下拟议的激励计划几乎可以肯定地收敛到一个\ epsilon $最优策略的情况。通过使用真实数据进行广泛的数值模拟,证明了该算法的有效性。所提出的算法计算期望效用成本以减轻增强扰动的随机特性的影响,然后基于感知到的期望效用成本更新激励策略。我们得出了在一定条件下拟议的激励计划几乎可以肯定地收敛到一个\ epsilon $最优策略的情况。通过使用真实数据进行广泛的数值模拟,证明了该算法的有效性。所提出的算法计算期望效用成本以减轻增强扰动的随机特性的影响,然后基于感知到的期望效用成本更新激励策略。我们得出了在一定条件下拟议的激励计划几乎可以肯定地收敛到一个\ epsilon $最优策略的情况。通过使用真实数据进行广泛的数值模拟,证明了该算法的有效性。
更新日期:2019-09-01
down
wechat
bug