Rebalancing the car-sharing system with reinforcement learning

Ren, Changwei; An, Lixingjian; Gu, Zhanquan; Wang, Yuexuan; Gao, Yunjun

doi:10.1007/s11280-020-00804-z

Rebalancing the car-sharing system with reinforcement learning

Published: 20 April 2020

Volume 23, pages 2491–2511, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

Changwei Ren¹,
Lixingjian An¹,
Zhanquan Gu²,
Yuexuan Wang^1,3 &
…
Yunjun Gao¹

803 Accesses
10 Citations
Explore all metrics

Abstract

With the sharing economy boom, there is a notable increase in the number of car-sharing corporations, which provided a variety of travel options and improved convenience and functionality. Owing to the similarity in the travel patterns of the urban population, car-sharing system often faces the problem of imbalance in the number of shared cars within the spatial distribution, especially during the rush hours. There are many challenges in redressing this imbalance, such as insufficient data and the large state space. In this study, we propose a new reward method called Double P (Picking & Parking) Bonus (DPB). We model the research problem as a Markov Decision Process (MDP) problem and introduce Deep Deterministic Policy Gradient, a state-of-the-art reinforcement learning framework, to find a solution. The results show that the rewarding mechanism embodied in the DPB method can indeed guide the users’ behaviors through price leverage, increase user stickiness, and cultivate user habits, thereby boosting the service provider’s long-term profit. In addition, taking the battery power of the shared car into consideration, we use the method of hierarchical reinforcement learning for station scheduling. This station scheduling method encourages the user to place the car that needs to be charged on the charging post within a certain site. It can ensure the effective use of charging pile resources, thereby rendering the efficient functioning of shared cars.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Evaluating the Impact of AI-Based Priced Parking with Social Simulation

Dynamic Pricing for Parking Facility

Dynamic Pricing for Parking System Using Reinforcement Learning

References

Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents[C]. AAAI/IAAI, pp. 119–125 (2002)
Cai, Q., Filos-Ratsikas, A., Tang, P., et al.: Reinforcement Mechanism Design for e-commerce[C]. Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, pp. 1339–1348 (2018)
Chemla, D., Meunier, F., Pradeau, T., Calvo, R.W., Yahiaoui, H.: Self-service bike sharing systems: simulation, repositioning pricing (2013)
Dayan, P., Hinton, G.E.: Feudal reinforcement learning[C]//Advances in neural information processing systems, pp. 271–278 (1993)
Dean, T., Lin, S.H.: Decomposition techniques for planning in stochastic domains[C]. IJCAI 2, 3 (1995)
Google Scholar
Dietterich, T.G.: The MAXQ Method for Hierarchical Reinforcement Learning[C]. ICML 98, 118–126 (1998)
Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. J. Artif. Intell. Res. 13, 227–303 (2000)
Article MathSciNet Google Scholar
Fricker, C., Gast, N.: Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity. Euro J. Transp. Logist. 5(3), 261–291 (2016)
Article Google Scholar
Ghosh, S., Trick, M., Varakantham, P.: Robust Repositioning to Counter Unpredictable Demand in Bike Sharing Systems. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), pp. 3096–3102. AAAI Press. http://dl.acm.org/citation.cfm?id=3061053.3061055 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory[J]. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results[C]. Proc. Tenth Int. Conf. Mach. Learn. 951, 167–173 (1993)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2016)
Google Scholar
Li, Y., Yu, Z., Yang, Q.: Dynamic Bike Reposition: A SpatioTemporal Reinforcement Learning Approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1724–1733. ACM (2018)
Liu, J., Sun, L., Chen, W., Xiong, H.: Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1005–1014. ACM (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing Atari with deep reinforcement learning. Proceedings of Workshops at the 26th Neural Information Processing Systems 2013. Lake Tahoe, pp. 201–220 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Ning, W., Wenjian, Z., Xiang, L., Jing, Z.: Inter-Site-Vehicle Artificial scheduling strategy design for electric vehicle Sharing[J]. J. Tongji Univ. (Nat. Sci.) 46 (8), 1064–1071 (2018)
MATH Google Scholar
Ning, W., Yajing, S., Linhao, T., WenJian, Z.: Adaptive Scheduling Strategy in Car-sharing System Based on Feedback Dynamic Pricing. J. Transp. Syst. Eng. Inf. Technol. 18(5), 12–17 (2018)
Google Scholar
O’Mahony, E., Shmoys, D.B: Data analysis and optimization for (citi) bike sharing. In: AAAI, pp. 687–694 (2015)
Pan, L., Cai, Q., Fang, Z., et al.: A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems[J]. arXiv:1802.04592(2018)
Sergey, I., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Silver, D., Lever, G., Hess, N., et al.: Deterministic policy gradient algorithms. Proceedings of the International Conference on Machine Learning. Beijing, pp. 387–395 (2014)
Sutton, R.S., Barto, AG.: Reinforcement learning: an Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Sutton, R.S., McAllister, D.A., Singh, S.P., et al.: Policy gradient methods for reinforcement learning with function approximation. Proceedings of the Advances in Neural Information Processing Systems, Denver, pp. 1057–1063 (1999)
Singla, A., Santoni, M., ok, Gabor B., Mukerji, P., Meenen, M., Krause, A.: Incentivizing users for balancing bike sharing systems. In: AAAI, pp. 723–729, Austin, Texas (2015)
Van Seijen, H., et al.: Hybrid reward architecture for reinforcement learning. Advances in Neural Information Processing Systems (2017)
Watkins, C.J.C.H.: Learning from delayed rewards. Robot. Auton. Syst. 15(4), 233–235 (1989)
Google Scholar

Download references

Acknowledgments

This work is supported in part by National Key R&D Program of China Under Grant No. 2018YFB1004003 and the National Natural Science Foundation of China under Grant No. U1636215.

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, China
Changwei Ren, Lixingjian An, Yuexuan Wang & Yunjun Gao
Cyberspace Institute of Advanced Technology (CIAT), Guangzhou University, Guangzhou, China
Zhanquan Gu
Department of Computer Science, The University of Hong Kong, Hongkong, China
Yuexuan Wang

Authors

Changwei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Lixingjian An
View author publications
You can also search for this author in PubMed Google Scholar
Zhanquan Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yuexuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhanquan Gu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Data Science in Cyberspace 2019

Guest Editors: Bin Zhou, Feifei Li and Jinjun Chen

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ren, C., An, L., Gu, Z. et al. Rebalancing the car-sharing system with reinforcement learning. World Wide Web 23, 2491–2511 (2020). https://doi.org/10.1007/s11280-020-00804-z

Download citation

Received: 30 August 2019
Revised: 29 January 2020
Accepted: 21 February 2020
Published: 20 April 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11280-020-00804-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rebalancing the car-sharing system with reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Evaluating the Impact of AI-Based Priced Parking with Social Simulation

Dynamic Pricing for Parking Facility

Dynamic Pricing for Parking System Using Reinforcement Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rebalancing the car-sharing system with reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Evaluating the Impact of AI-Based Priced Parking with Social Simulation

Dynamic Pricing for Parking Facility

Dynamic Pricing for Parking System Using Reinforcement Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation