Skip to main content
Log in

Rebalancing the car-sharing system with reinforcement learning

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the sharing economy boom, there is a notable increase in the number of car-sharing corporations, which provided a variety of travel options and improved convenience and functionality. Owing to the similarity in the travel patterns of the urban population, car-sharing system often faces the problem of imbalance in the number of shared cars within the spatial distribution, especially during the rush hours. There are many challenges in redressing this imbalance, such as insufficient data and the large state space. In this study, we propose a new reward method called Double P (Picking & Parking) Bonus (DPB). We model the research problem as a Markov Decision Process (MDP) problem and introduce Deep Deterministic Policy Gradient, a state-of-the-art reinforcement learning framework, to find a solution. The results show that the rewarding mechanism embodied in the DPB method can indeed guide the users’ behaviors through price leverage, increase user stickiness, and cultivate user habits, thereby boosting the service provider’s long-term profit. In addition, taking the battery power of the shared car into consideration, we use the method of hierarchical reinforcement learning for station scheduling. This station scheduling method encourages the user to place the car that needs to be charged on the charging post within a certain site. It can ensure the effective use of charging pile resources, thereby rendering the efficient functioning of shared cars.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15

Similar content being viewed by others

References

  1. Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents[C]. AAAI/IAAI, pp. 119–125 (2002)

  2. Cai, Q., Filos-Ratsikas, A., Tang, P., et al.: Reinforcement Mechanism Design for e-commerce[C]. Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, pp. 1339–1348 (2018)

  3. Chemla, D., Meunier, F., Pradeau, T., Calvo, R.W., Yahiaoui, H.: Self-service bike sharing systems: simulation, repositioning pricing (2013)

  4. Dayan, P., Hinton, G.E.: Feudal reinforcement learning[C]//Advances in neural information processing systems, pp. 271–278 (1993)

  5. Dean, T., Lin, S.H.: Decomposition techniques for planning in stochastic domains[C]. IJCAI 2, 3 (1995)

    Google Scholar 

  6. Dietterich, T.G.: The MAXQ Method for Hierarchical Reinforcement Learning[C]. ICML 98, 118–126 (1998)

    Google Scholar 

  7. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. J. Artif. Intell. Res. 13, 227–303 (2000)

    Article  MathSciNet  Google Scholar 

  8. Fricker, C., Gast, N.: Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity. Euro J. Transp. Logist. 5(3), 261–291 (2016)

    Article  Google Scholar 

  9. Ghosh, S., Trick, M., Varakantham, P.: Robust Repositioning to Counter Unpredictable Demand in Bike Sharing Systems. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), pp. 3096–3102. AAAI Press. http://dl.acm.org/citation.cfm?id=3061053.3061055 (2016)

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory[J]. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results[C]. Proc. Tenth Int. Conf. Mach. Learn. 951, 167–173 (1993)

    Google Scholar 

  12. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)

  13. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2016)

    Google Scholar 

  14. Li, Y., Yu, Z., Yang, Q.: Dynamic Bike Reposition: A SpatioTemporal Reinforcement Learning Approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1724–1733. ACM (2018)

  15. Liu, J., Sun, L., Chen, W., Xiong, H.: Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1005–1014. ACM (2016)

  16. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing Atari with deep reinforcement learning. Proceedings of Workshops at the 26th Neural Information Processing Systems 2013. Lake Tahoe, pp. 201–220 (2013)

  17. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  18. Ning, W., Wenjian, Z., Xiang, L., Jing, Z.: Inter-Site-Vehicle Artificial scheduling strategy design for electric vehicle Sharing[J]. J. Tongji Univ. (Nat. Sci.) 46 (8), 1064–1071 (2018)

    MATH  Google Scholar 

  19. Ning, W., Yajing, S., Linhao, T., WenJian, Z.: Adaptive Scheduling Strategy in Car-sharing System Based on Feedback Dynamic Pricing. J. Transp. Syst. Eng. Inf. Technol. 18(5), 12–17 (2018)

    Google Scholar 

  20. O’Mahony, E., Shmoys, D.B: Data analysis and optimization for (citi) bike sharing. In: AAAI, pp. 687–694 (2015)

  21. Pan, L., Cai, Q., Fang, Z., et al.: A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems[J]. arXiv:1802.04592(2018)

  22. Sergey, I., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

  23. Silver, D., Lever, G., Hess, N., et al.: Deterministic policy gradient algorithms. Proceedings of the International Conference on Machine Learning. Beijing, pp. 387–395 (2014)

  24. Sutton, R.S., Barto, AG.: Reinforcement learning: an Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  25. Sutton, R.S., McAllister, D.A., Singh, S.P., et al.: Policy gradient methods for reinforcement learning with function approximation. Proceedings of the Advances in Neural Information Processing Systems, Denver, pp. 1057–1063 (1999)

  26. Singla, A., Santoni, M., ok, Gabor B., Mukerji, P., Meenen, M., Krause, A.: Incentivizing users for balancing bike sharing systems. In: AAAI, pp. 723–729, Austin, Texas (2015)

  27. Van Seijen, H., et al.: Hybrid reward architecture for reinforcement learning. Advances in Neural Information Processing Systems (2017)

  28. Watkins, C.J.C.H.: Learning from delayed rewards. Robot. Auton. Syst. 15(4), 233–235 (1989)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by National Key R&D Program of China Under Grant No. 2018YFB1004003 and the National Natural Science Foundation of China under Grant No. U1636215.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhanquan Gu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Data Science in Cyberspace 2019

Guest Editors: Bin Zhou, Feifei Li and Jinjun Chen

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, C., An, L., Gu, Z. et al. Rebalancing the car-sharing system with reinforcement learning. World Wide Web 23, 2491–2511 (2020). https://doi.org/10.1007/s11280-020-00804-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-020-00804-z

Keywords

Navigation