Elsevier

Computers in Industry

Volume 119, August 2020, 103239
Computers in Industry

Use of Proximal Policy Optimization for the Joint Replenishment Problem

https://doi.org/10.1016/j.compind.2020.103239Get rights and content

Highlights

  • We apply Proximal Policy Optimization (PPO) on the Joint Replenishment Problem.

  • PPO approaches the optimal solution for small-scale settings.

  • PPO outperforms two other heuristics in small and large-scale settings.

  • Machine learning algorithms can facilitate collaborative shipping.

Abstract

Deep reinforcement learning has been coined as a promising research avenue to solve sequential decision-making problems, especially if few is known about the optimal policy structure. We apply the proximal policy optimization algorithm to the intractable joint replenishment problem. We demonstrate how the algorithm approaches the optimal policy structure and outperforms two other heuristics. Its deployment in supply chain control towers can orchestrate and facilitate collaborative shipping in the Physical Internet.

Introduction

The recent digitization in transport provides new opportunities to improve the efficiency of current logistics networks. By sharing shipment data across companies, freight loads can be combined to increase truck fill rates. In the innovative concept of the Physical Internet, an open interconnected logistics network introduced by [27], companies collaborate by sharing freight and resources [32]. This entails a transition from individually optimized supply chains towards a more holistic and collaborative integrated “network of networks”. To facilitate smooth routing of freight across the network, digital control towers visualize shipments and support replenishment decisions based on real-time information and analytics.

As the foundation of the Physical Internet is built around multi-dimensional collaboration, it complements the emerging sharing economy in which goods and services are being shared and exchanged more easily [7]. Horizontal collaboration has been considered as effective practice for sustainable logistics and freight transport and it has gained increased attention in recent years [33]. It is named as one of the solutions to effectively decarbonize freight transport [23], [2].

A prerequisite of collaborative shipping, either through bundling less-than-truckload (LTL) or backhauling full truckload (FTL) shipments, is that the replenishment cycles of the collaborating companies are synchronized, i.e., their replenishment occurs at the same time. This requires the implementation of a joint replenishment policy that takes into account the specifics of freight transport to stimulate joint shipments and avoid individual transports. The joint replenishment literature is rich and provides a range of heuristics. Most joint replenishment policies, however, ignore the limitation of the vehicle capacity and do not deal with the (under)utilization of the vehicles. We explore how machine learning can be used to develop joint replenishment policies that can be implemented in a control tower setting to facilitate collaborative shipping.

We implement a state-of-the-art machine learning algorithm to decide on the replenishments of a group of collaborating companies, i.e. how much and when to order or ship. We focus on periodic review joint replenishment policies that take the capacity of a full truck into account. The goal is to minimize (joint) transportation, holding and backorder costs. We deploy the proximal policy optimization (PPO) algorithm, a state-of-the-art optimization method in the domain of deep reinforcement learning (DRL). The PPO algorithm has been praised to capture some of the stability and convergence properties of trust region policy optimization algorithms, while being much simpler to implement and tune. Our numerical results confirm its stable and converging behaviour, while developing well-performing policies.

The implementation of machine learning algorithms, such as the one presented in this paper, can facilitate smart automation of freight shipments in today's digital era. It complements the algorithms developed in [12] to identify collaboration partners based on their geographical compatibility and the gain sharing methods, such as the ones discussed in [9]. As such this paper responds to various challenges faced by the current logistics industry, such as the transition towards the Physical Internet, more economically friendly eco-systems, and the digital transformation of logistics.

Section snippets

A review of capacitated joint replenishment policies

The joint replenishment problem (JRP) literature focuses on developing a replenishment policy for multiple items that minimizes the sum of inventory holding, backorder and ordering/transportation costs. The ordering costs typically include a major and a minor ordering cost, with the major ordering cost incurred every time an order is placed, independent of the number of items in the order, and the minor ordering cost charged for every item that is included in the order. The JRP can also be

Deep reinforcement learning and its application in inventory management

Reinforcement learning (RL) is the branch of machine learning algorithms that focuses on sequential decision-making problems. An RL agent aims to learn a policy that maximizes future (discounted) rewards by interacting with an environment. For an extensive introduction to RL we refer to [42]. In contrast to supervised machine learning, where for instance a deep neural net is used to classify images and the function is approximated that maps pixels to the label of the image, RL algorithms

Problem statement

We consider n = 2 shippers and a neutral orchestrator with full visibility on each shipper's demand and inventories, who is in charge of the coordination of their replenishments, for instance using a control tower. To ensure truck efficiency, shipments are only made in FTLs with a capacity of V units. The aggregated order quantity of both shippers in each period t is thus a multiple of an FTL of V units. Denote qi,t shipper i's order quantity and Mt the number of truckloads shipped in period t

Application of the proximal policy optimization algorithm

The PPO algorithm that we implement is an actor-critic algorithm that leverages elements from trust region policy optimization methods. A neural network is used to develop a policy π(st;θ), known as the actor network, and another neural net is used to estimate the value function vπ(st;ϕ) when following the policy of the actor, which is the critic network. The values of the parameters θ and ϕ of these neural nets are set by training the algorithm, which we elaborate below. By clipping the

Results

We show the performance of the PPO algorithm to our joint replenishment problem in a numerical experiment. We first conduct a small-scale experiment with limited demand support and compare the policies developed by PPO with the optimal policies, found using DP. To solve the DP, we employ value iteration [36] with discount factor γ = 0.99. We evaluate the cost performance, as well as the order quantities and the steady state distributions of the inventory levels. We then expand our study to a

Conclusion

In this paper we use proximal policy optimization to solve the joint replenishment problem with full truckload shipments. Proximal policy optimization is a deep reinforcement learning algorithm, that performs well without extensive hyperparameter tuning. We show in a small-scale experiment with limited demand support how proximal policy optimization develops policies that approximate the optimal policy structure and outperforms the periodic review minimum order quantity and dynamic order-up-to

Declarations of interest

None

References (51)

  • D. Atkins et al.

    Periodic versus ”can-order” policies for coordinated multi-item inventory systems

    Management Science

    (1988)
  • J.L. Balintfy

    On a basic class of multi-item inventory problems

    Management science

    (1964)
  • L.d.S.L. Bastos et al.

    A systematic literature review on the joint replenishment problem solutions: 2006–2015

    Production

    (2017)
  • J. Beliën et al.

    Collaborative shipping: Logistics in the sharing economy

    ORMS Today

    (2017)
  • R. Bellman

    The Theory of Dynamic Programming

    Bulletin of the Amer Math Soc

    (1954)
  • R. Boute et al.

    A better way to share the gains of collaborative shipping

    Supply Chain Management Review

    (2018)
  • N.C. Büyükkaramikli et al.

    Coordinated logistics: joint replenishment with capacitated transportation for a supply chain

    Production and Operations Management

    (2014)
  • G. Cachon

    Managing a retailer's shelf space, inventory, and transportation

    Manufacturing & Service Operations Management

    (2001)
  • S. Creemers et al.

    Tri-vizor uses an efficient algorithm to identify collaborative shipping opportunities

    Interfaces

    (2017)
  • J. Gijsbrechts et al.

    Can Deep Reinforcement Learning Improve Inventory Management? Performance and Implementation of Dual Sourcing-Mode Problems

    SSRN Electronic Journal.

    (2019)
  • B. Golany et al.

    Comparative analysis of multi-item joint replenishment inventory models

    International Journal of Production Research

    (1992)
  • M. Gürbüz et al.

    Coordinated replenishment strategies in inventory/distribution systems

    Management Science

    (2007)
  • J. Kakade et al.

    Approximately optimal approximate reinforcement learning

    Proceedings of the Nineteenth International Conference on Machine Learning

    (2002)
  • D.P. Kingma et al.

    Adam: A Method for Stochastic Optimization.

    In ICLR 2015. arXiv:1412.6980

    (2015)
  • L. Li et al.

    A stochastic joint replenishment problem with dissimilar items

    Decision Sciences.

    (2019)
  • Cited by (0)

    View full text