1 Introduction

In manufacturing system management, capacity is a key factor in matching supply with demand, i.e., having a system that is able to produce what is needed to satisfy customer demand.

Several factors negatively impact system capacity. The most frequently studied ones are those related to system balancing and to part batching when setup times are present, as severe bottlenecks and/or small batches can significantly reduce system capacity, thus leading to the inability of the manufacturing system to respond in a timely way to the market demand (Cachon and Terwiesch 2012).

Batches induced by setup times are called serial batches, and although they are very important in manufacturing systems, they are not the only type of batches that can be present on the shop floor. Transfer batches and parallel batches can also be found in manufacturing systems, the first being related to the capacity of the material handling resources, and the second, as the serial ones, to the capacity of the machines.

Although both serial and parallel batches are related to and affect machine capacity, their nature is very different. Serial batches are due to the presence of setup times, while parallel batches stem from the ability of machines to accommodate and manufacture several jobs at the same time. They are less studied than serial and transfer batches because they are less frequent; however, they are no less important.

Specifically, parallel batches can be found in many manufacturing processes where heating operations are necessary, such as in mold manufacturing (Liu et al. 2016) and the semiconductor industry (Mönch et al. 2012), or when there are sterilization phases (Ozturk et al. 2012), just to cite a few examples.

In all these cases, operations take quite a long time, and the machines are usually batch machines that can accommodate several parts and process them simultaneously and exactly to virtually share the long processing time among all the parts processed at the same time. Each part has an individual size, and batch machines (e.g., batch oven for heating treatments or autoclaves for sterilization operations) have a limited capacity; therefore, the number of parts that can be in a single batch is limited.

Due to the limited capacity of the batch machine, and thus to the limited number of parts that can be accommodated in it, when several jobs have to be processed on the batch machine, they have to be partitioned in several batches. When batches have been created, their processing has to be scheduled on the machine, and this decision is obviously intertwined with batch creation. Moreover, the two decisions (i.e., how to create batches and how to sequence them on the batch machines) strictly depend on the objective of the shop floor manager (minimizing the number of tardy jobs, minimizing the maximum delay, reducing the total flow time, maximizing the machine utilization, etc.).

In this paper, the above-described parallel-batch problem is considered. Specifically, given a set of jobs that are all available at the same time, the paper addresses how to partition them in batches and how to sequence batches on machines, with the objective of minimizing the total completion time.

With respect to the current literature, the problem addressed in the paper is the same problem as in Rafiee Parsa et al. (2016), with the main difference that it is extended to the parallel-machine case. Following the three-field notation of Graham et al. (1979), the problems studied in the paper are \(1|{p-\text {batch}, s_j\le b}|\sum {C_j}\) and \(Pm|{p-\text {batch}, s_j\le b}|\sum {C_j}\).

A fundamental work in the field of parallel-batch processor scheduling is that of Uzsoy (1994), where a single-batch processing machine problem is studied with regard to makespan and total flow time criteria. In particular, in this work, nonidentical job sizes are taken into account, and complexity results are also provided.

A large part of the literature on parallel batching is devoted to the minimization of the makespan criterion—e.g., Damodaran et al. (2006), Dupont and Dhaenens-Flipo (2002), Rafiee Parsa et al. (2010), Li (2017) and Muter (2020)—while the total flow time problems have been less studied (Jolai Ghazvini and Dupont 1998; Rafiee Parsa et al. 2016).

The work in Jolai Ghazvini and Dupont (1998) and a modified version of the genetic algorithm presented in Damodaran et al. (2006) have been used as benchmark procedures for the hybrid max-min ant system presented in Rafiee Parsa et al. (2016). Different objective functions dealing with tardiness and lateness have also been addressed (e.g., Wang 2011; Malapert et al. 2012; Kosch and Beck 2014; Cabo et al. 2015).

Other recent works on single- and parallel-machine batching problems are Beldar and Costa (2018), Jia et al. (2018) and Ozturk et al. (2017). In Ozturk et al. (2017), the authors address a problem with unit size jobs and maximum completion time objective. In Beldar and Costa (2018) and Jia et al. (2018), instead, the total completion time criterion is considered, although Jia et al. (2018) consider the weighted version and equal processing time for all jobs, while in Beldar and Costa (2018) only the single-machine case is tackled, with constraints on cardinality and size of job batches.

A very recent contribution based on column generation is Ozturk (2020), where the fundamental model is a time-indexed formulation, with a set-partition problem as master, followed by a constructive heuristic that combines parts of the relaxed formulation.

The contribution of this paper is threefold.

  • A new graph-based model for \(1|{p-\text {batch}, s_j\le b}|\sum {C_j}\) is developed; such a model induces a very large linear program with an exponential number of variables, which can be handled by standard column generation techniques. The pricing step is efficiently solved by dynamic programming. The new model provides the strongest linear relaxation currently available in the literature for the studied problems.

  • A heuristic procedure of the so-called price and branch type—following the terminology of Desrosiers and Lübbecke (2011)—relying on the graph model is developed. This procedure allows one to generate high-quality solutions (with certified optimality gaps) for fairly large instances in short computation times. The column generation procedure is also embedded in an exact branch and price procedure that is able to deliver optimal solutions for \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) instances with up to 40 jobs—previous state-of-the-art results in  Azizoglu and Webster (2000) were limited to 25 jobs.

  • The graph-based model and the related column generation procedure and heuristic are also extended to multi-machine problems—\(Pm|{p-\text {batch}, s_j\le b}\) \(|\sum C_j\), \(Rm|{p-\text {batch}, s_j\le b}|\sum C_j\), etc. Computational experience shows that \(Pm|{p-\text {batch}, s_j\le b}|\sum C_j\) for up to five machines and 100 jobs is solvable by the proposed approach with no significant loss of solution quality with respect to the single-machine case.

The remainder of the paper is structured as follows. The column generation approach for the \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) problem is developed in Sect. 2, while Sect. 3 presents the extension of the approach to the parallel-machine case, with special attention to the case with identical parallel machines. Computational results are reported in Sect. 4. Section 5 concludes the paper, discussing directions for future research.

2 Single-machine models

This section deals with the single-machine problem. The new graph-based model is established in Sect. 2.1; the column generation procedure that solves its continuous relaxation is developed in Sect. 2.2, while in Sect. 2.3 a “price and branch” heuristic is formulated. The same algorithmic elements allow us to formulate an exact branch and price method in Sect. 2.4.

In the remainder of the paper, the following notation is used. \(N=\{1,2,\dots ,n\}\) denotes the set of jobs to be scheduled; for each job \(j\in N\), its processing time \(p_j\) and its size \(s_j\), both integers, are given. The machine has a given integer capacity denoted by b. When a subset of jobs is packed in a batch B, \(p_B=\max \{p_j:j\in B\}\) is used to indicate the batch processing time. Every batch B is required to have \(\sum _{j\in B}s_j\le b\). The machine processes the jobs in a batch sequence \(S=(B_1,B_2,\dots ,B_t)\), where each job j in the kth batch \(B_k\) shares the batch completion time: \(C_j=C_{B_k}=\sum _{l=1}^k p_{B_l}\) \(\forall j\in B_k\). The \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) problem calls for creating the batches and sequencing them in order to minimize \(f(S)=\sum _{j\in N}C_j\).

The problem is known to be NP-hard (Uzsoy 1994). A mixed-integer linear programming (MILP) model for this problem, as given by Rafiee Parsa et al. (2016), is as follows, where the variable \(x_{jk}=1\) if job j is scheduled in the kth batch; the model always arranges the jobs in n batches \((B_1,B_2,\dots ,B_n)\), some of which can be empty. Other variables of the model are the completion time \(C_{B_k}\) of the kth batch—note that the model allows for empty batches; the variable \(p_{B_k}\) that represents the processing time of the kth batch; and the variable \(C_j\) corresponding to the completion time of job j.

$$\begin{aligned} \text {minimize}\quad&\sum _{j = 1}^{n} C_j \end{aligned}$$
(1)
$$\begin{aligned} \hbox {subject to}&\sum _{k = 1}^{n} x_{jk} = 1&j = 1, \dots , n \end{aligned}$$
(2)
$$\begin{aligned}&\sum _{j = 1}^{n} s_j x_{jk} \le b&k = 1, \dots , n \end{aligned}$$
(3)
$$\begin{aligned}&p_{B_k} \ge p_jx_{jk}&j = 1, \dots , n,\ k = 1, \dots , n \end{aligned}$$
(4)
$$\begin{aligned}&C_{B_1} \ge p_{B_1} \end{aligned}$$
(5)
$$\begin{aligned}&C_{B_k} \ge C_{B_{k-1}} + p_{B_{k}}&k = 2, \dots , n \end{aligned}$$
(6)
$$\begin{aligned}&C_j \ge C_{B_k} - M \left( 1 - x_{jk} \right)&j = 1, \dots , n,\ k = 1, \dots , n \end{aligned}$$
(7)
$$\begin{aligned}&p_{B_k}, C_{B_k}, C_j \ge 0&j = 1, \dots , n,\ k = 1, \dots , n \end{aligned}$$
(8)
$$\begin{aligned}&x_{jk} \in \left\{ 0, 1 \right\}&j = 1, \dots , n,\ k = 1, \dots , n \end{aligned}$$
(9)

The total completion time is expressed by (1). Constraint set (2) ensures that each job is assigned to exactly one batch, and since all the jobs assigned to a batch cannot exceed the batch capacity, constraint set (3) has to be defined. Constraint set (4) represents the fact that the processing time of a batch is the maximum processing time of all the contained jobs. The completion time for the first batch is simply its processing time, since it is the first to be processed by the machine, as stated in constraint (5). Constraint set (6), instead, ensures that the completion time for each of the other batches is evaluated as the sum of its processing time and the completion time of the previous batch. Constraint set (7) specifies that the completion time of a job must be the completion time of the corresponding batch (the constant M must be very large).

Model (1)–(9) is known to be very weak. A state-of-the-art solver like CPLEX can waste hours over 15-job instances, with optimality gaps of \(100\%\) at the root branching node.

Fig. 1
figure 1

Batch sequence as a path on a graph. The job set N is partitioned over the arcs of P. \(C_j(S)\) is the completion time of job j induced by sequence S. The path provides information about partitioning the jobs into batches \(B_1,B_2,B_3,B_4\) and their ordered sequence

2.1 A new problem formulation

This paper proposes a new model where a batch sequence is represented as a path on a graph. Let be the set of all possible batches. Define a (multi)graph G(VA) with vertex and arc sets as follows

Each arc in A is a triple (ikB) with head k and tail i and an associated batch B with \((k-i)\) jobs; it will represent the batch B scheduled in a batch sequence such that exactly \(n-i+1\) jobs are scheduled from batch B up to the end of the sequence. For each arc (ikB), a cost is defined as \(c_{ikB}=(n - i + 1)p_B\), with \(p_B=\max \{p_j:j\in B\}\), and where the \((n-i+1)\) factor precisely models the abovementioned \((n-i+1)\) jobs to the end of the sequence. Note that in G, a path

$$\begin{aligned} P\!=\![(i_1,k_1,B_1),(i_2,k_2,B_2),\dots ,(i_r,k_r,B_r)]\ \ (i_\ell =k_{\ell -1}) \end{aligned}$$

connecting nodes from \(i_1\) to \(k_r\) has the property

$$\begin{aligned} \sum _{\ell =1}^r|B_\ell |= & {} \sum _{\ell =1}^r(k_\ell -i_\ell )= \sum _{\ell =1}^r k_\ell -\sum _{\ell =1}^ri_\ell \nonumber \\= & {} \sum _{\ell =1}^rk_\ell -\sum _{\ell =2}^rk_{\ell -1}-i_1=k_r-i_1, \end{aligned}$$
(10)

Property 1 highlights the relationship between feasible batches and paths of the above-defined graph.

Property 1

Each feasible batch sequence S corresponds to a path P in G(VA) from 1 to \(n+1\) such that the set of jobs N is partitioned over the arcs in P, and \(f(S)=\sum \{c_{ikB}:(i,k,B)\in P\}\).

Proof

Refer to Fig. 1 for an illustration of the idea. The one-to-one mapping of batch sequences to paths is easily established. Given a batch sequence \(S=(B_1,B_2,\dots ,B_t)\), the path

$$\begin{aligned} P=[(i_1,k_1,B_1),(i_2,k_2,B_2),\dots ,(i_t,k_t,B_t)] \end{aligned}$$
(11)

can be built from arcs of G, choosing the arcs from the arc set of G as follows:

(12)

Note that \(k_t-i_1=\sum _{\ell =1}^t|B_\ell |=n\), and hence \(k_t=n+1\), and P is a path from 1 to \(n+1\) in G; the job set N is guaranteed to be partitioned over the arcs of P because it is partitioned in the batches of S by hypothesis.

On the other hand, given a path P like (11) from node 1 to node \(n+1\) of G, such that \(\{B:(i,k,B)\in P\}\) forms a partition of N, it can be easily seen that the arcs of P satisfy (12) by connectivity of the path, and every \(\sum _ {j\in B}s_j\le b\) for all \((i,k,B)\in P\) by definition of the arc set A. Hence, \(S=(B_1,B_2,\dots ,B_t)\) with the \(B_1,\dots ,B_t\) defined by the arc-batches of P is a feasible batch sequence.

It remains to be proven that \(f(S)=\sum _{(i,k,B)\in P}c_{ikB}\). For an arc \((i_q,k_q,B_q)\) in position q on P, the number of jobs scheduled from \(B_q\) to \(B_t\) is \(\sum _{\ell =q}^t|B_\ell |=k_t-i_q=(n+1-i_q)\). The objective function for the batch sequence is

Adding by column, it can be rewritten as:

\(\square \)

Fig. 2
figure 2

Example of a full graph with \(b=10\) and four jobs. Job sizes are \(s_1=3, s_2=4, s_3=6, s_4=8\)

Example Figure 2 shows an example of a full graph with all possible batches. In this small four-job example, \(p_1,\dots ,p_4=42,37,21,16\), \(s_1,\dots ,s_4=3,4,6,8\) and \(b=10\), and batches can contain one or two jobs at most; note how the graph size is already quite large, even for this small example.

A path from node 1 to node 5 in this graph represents a feasible schedule if the job set \(\{1,2,3,4\}\) is partitioned over the arcs of the path. For example, the path \(P=[(1,2,\{4\}), (2,4,\{1, 3\}),(4,5,\{2\})]\) represents the feasible batch sequence \(S=(\{4\},\{1,3\},\{2\})\). In such sequence, the reader can easily compute that \(C_4=16\), \(C_1=C_3=58\), \(C_2=95\) and \(C_1+C_2+C_3+C_4=227\); in the path P, the arc costs are, by definition, \(c_{(1,2,\{4\})}=4p_4=64\), \(c_{(2,4,\{1,3\})}=3p_1=126\), \(c_{(4,5,\{2\})}=p_2=37\), and the total cost of P is \(c_{(1,2,\{4\})}+c_{(2,4,\{1,3\})}+c_{(4,5,\{2\})}=227\).

On the other hand, a path like \(P'=[(1,2,\{3\}),(2,4,\{1, 3\}),(4,5,\{3\})]\) does not represent a feasible batch sequence since it fails to partition the job set \(\{1,2,3,4\}\) over its arcs.

By Property 1, then, an optimal solution for the \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) problem can be computed by identifying on the very large graph G(VA) a minimum-cost path from node 1 to node \(n+1\) such that the job set N is exactly partitioned over the “B” components of the arcs in that path. This problem can be modeled by a very large linear program that includes features of a shortest path/minimum cost flow model as well as partition constraints, as follows. Let \(\varvec{a_B}\in \{0,1\}^n\) be the incidence column-vector of job set B, whose jth component \((\varvec{a_{B}})_j=1\) if \(j\in B\), and \(\mathbf{1}=(1,1,\dots ,1)^\mathsf {\scriptscriptstyle T}\) an all-ones column-vector with n components. Define binary decision variables \(x_{ikB}\) for each \((i,k,B)\in A\), so that \(x_{ikB}=1\) if arc (ikB) is on the optimal path from 1 to \((n+1)\). The linear program is written as follows.

$$\begin{aligned} \text {minimize}\quad&\mathop {\sum }\limits _{{(i,k,B)\in A}} c_{ikB}x_{ikB} \end{aligned}$$
(13)
$$\begin{aligned} \hbox {subject to}&\mathop {\mathop {\sum }\limits _{{(k,B):}}}\limits _{(i,k,B)\in A}x_{ikB} -\mathop {\mathop {\sum }\limits _{(k,B):}}\limits _{(k,i,B)\in A}x_{kiB}= {\left\{ \begin{array}{ll} 1&{}i=1\\ 0&{}i=2,\dots ,n\\ -1&{}i=n+1 \end{array}\right. } \end{aligned}$$
(14)
$$\begin{aligned}&\mathop {\sum }\limits _{{(i,k,B)\in A}} \varvec{a_B} x_{ikB} = \mathbf {1} \end{aligned}$$
(15)
$$\begin{aligned}&x_{ikB} \in \left\{ 0, 1 \right\} \qquad (i,k,B)\in A \end{aligned}$$
(16)

The objective function (13) together with the flow conservation constraints (14) is a classical formulation of the shortest path problem as a special case of a single-source single-sink minimum-cost flow linear program (see for example Ahuja et al. 1993). The vector expression of constraint (15) represents a group of n set-partitioning constraints on the job set \(N=\{1,2,\dots ,n\}\), enforcing the requirement that the job set is exactly partitioned over the arcs selected to be in the path—in scalar form, \(\sum _{(i,k,B)\in A} (\varvec{a_B})_jx_{ikB}=1\ \forall j\in N\).

2.2 Continuous relaxation for the new graph-based formulation: column generation

The continuous relaxation of (13)–(16), where the integrality constraints (16) are relaxed to

$$\begin{aligned} x_{ikB}\ge 0\qquad (i,k,B)\in A, \end{aligned}$$
(16')

is solved by means of a column generation procedure. Model (13)–(16’) is the master problem: a restricted master problem is made of a subset \(A'\subset A\) of arcs. Introducing dual variables \(u_1,u_2,\dots ,u_{n+1}\) for constraints (14) and \(v_1,\dots ,v_n\) for constraints (15), the dual of (13)–(16’) is

$$\begin{aligned} \text {maximize}\quad&u_1-u_{n+1}+\sum _{j=1}^n v_j\end{aligned}$$
(17)
$$\begin{aligned} \hbox {subject to}&u_i-u_k+\sum _{j\in B} v_j\le c_{ikB}&(i,k,B)\in A. \end{aligned}$$
(18)

Solving the restricted master problem leads to a basic feasible solution for the master problem and values for dual variables/ simplex multipliers \(\varvec{u}\), \(\varvec{v}\). Pricing the arcs \((i,k,B)\in A\) corresponds to finding the most violated dual constraints (18). The strategy developed in this paper is to price the arcs separately for each pair of indices ik with \(i<k\), therefore determining minimum (possibly negative) reduced costs

(19)

For fixed indices ik with \(i<k\), the \((u_i-u_k)\) part of (19) is constant, and the cardinality of B is also fixed at \(|B|=k-i\).

Finding the batch B that minimizes (19) for each given pair of indices ik with \(i<k\) and given batch processing time \(p_B\) can be done by exploiting the dynamic programming state space of a family of cardinality-constrained knapsack problems where items correspond to jobs. Assume that the jobs are indexed by longest processing time (LPT) order, so that

$$\begin{aligned} p_1\ge p_2\ge \dots \ge p_n. \end{aligned}$$

Define, for \(r=1,\dots ,n\),

$$\begin{aligned} g_r(\tau ,\ell )= & {} \max \Bigg \{\sum _{j=r}^nv_jy_j:\sum _{j=r}^ns_jy_j\le \tau ,\\&\sum _{j=r}^ny_j=\ell ,y_j\in \{0,1\}\Bigg \}, \end{aligned}$$

where \(g_r(\tau ,\ell )\) is the optimal value of a knapsack with profits \(v_j\) and sizes \(s_j\), limited to items/jobs \(r,r+1,\dots ,n\), total size \(\le \tau \) and cardinality \(\ell \). Variable \(y_j\) is set to 1, i.e., \(y_j=1\), if item/job j is included in the solution.

Optimal values for \(g_r(\tau , \ell )\) can be recursively computed (see Kellerer et al. 2004) as

$$\begin{aligned} g_r(\tau ,\ell )=\max {\left\{ \begin{array}{ll} g_{r+1}(\tau -s_r,\ell -1)+v_r&{}(y_r=1)\\ g_{r+1}(\tau ,\ell )&{}(y_r=0) \end{array}\right. } \end{aligned}$$

with boundary conditions

$$\begin{aligned} g_r(\tau ,1)&={\left\{ \begin{array}{ll} v_r&{}\text {if}\,s_r\le \tau \,(y_r=1)\\ 0&{}\text {otherwise}\,(y_r=0) \end{array}\right. }&\quad r=1,\dots ,n,\,\tau =0,\dots ,b\\ g_r(\tau ,0)&=0&\quad r=1,\dots ,n,\, \tau =0,\dots ,b\\ g_r(\tau ,\ell )&=-\infty&\quad \text {if}\,\ell >n-r+1\hbox { or }\tau <0.\nonumber \\ \end{aligned}$$

The corresponding optimal job sets are denoted by \(B_r(\tau ,\ell )\); such sets can be retrieved by backtracking. The following property establishes that the state space \(g_r(\tau ,\ell )\) is sufficient for pricing all relevant arcs.

Property 2

Let \(L=\{1\}\cup \{j>1:p_j<p_{j-1}\}\). For any given pair of indices ik with \(i<k\), an arc with minimum reduced cost \((i,k,B^*)\) is one of

$$\begin{aligned} (i,k,B_r(b,k-i))\qquad r\in L. \end{aligned}$$
(20)

Proof

Every arc (ikB) can be shown to have a reduced cost not less than some of the arcs in (20). Let \(\bar{c}_{ikB}=(n-i+1)p_B-\sum _{j\in B}v_j-(u_i-u_k)\) be the reduced cost of an arc (ikB). Recall that \(|B|=k-i\), and the jobs are numbered in non-increasing order of processing times. Choose r as the smallest job index such that \(p_{r}=p_B\). Note that \(B\subseteq \{r,r+1,\dots ,n\}\) and \(r\in L\). Consider knapsack \(g_{r}(b,k-i)\) and the associated optimal subset \(B_{r}=B_{r}(b,k-i)\). The batch B is a feasible solution for knapsack \(g_{r}(b,k-i)\); hence \(\sum _{j\in B}v_j\le g_r(b,k-i)\); also, because of the choice of r, \(p_{B_r}\le p_r=p_B\). Thus,

$$\begin{aligned} \bar{c}_{ikB}=&(n-i+1)p_B-\sum _{j\in B}v_j-(u_i-u_k)\ge \\ \ge&(n-i+1)p_{B_r}-g_r(b,k-i)-(u_i-u_k)= \bar{c}_{ikB_r}. \end{aligned}$$

\(\square \)

All the relevant arcs with minimum reduced cost can be generated by the procedure reported in Algorithm 1  (NewCols).

figure a

The size of the state space required for the pricing is bounded by \(\mathcal O(n^2b)\), while the pricing procedure can have two bottlenecks:

  1. (a)

    the \(\mathcal O(n^3)\) effort due to the three nested loops on lines 4, 6, 8. The while on line 6 can be executed n times in the worst case;

  2. (b)

    filling the state space \(g_r(\tau ,\ell )\), which requires at most \(\mathcal O(n^2b)\) arithmetic operations. A memorized dynamic programming table is used, so that the execution of the top-down recursion for computing an entry \(g_r(\tau ,\ell )\) is deferred until the first time the value is queried. Then, the value is kept in storage and accessed in \(\mathcal O(1)\) time if it is queried again.

Because of these two possible bottlenecks, the running time of NewCols is bounded from above by \(\mathcal O(\max (n^3, n^2b))\).

2.3 Heuristic procedure: price and branch

The column generation described in the previous section is used to solve the continuous relaxation of the master problem. Once the relaxed optimum has been found, the resulting restricted master problem is taken, the variables are set to binary type, and the resulting MILP is solved using CPLEX in order to obtain a heuristic solution for the master. This is often called “price and branch,” as opposed to the exact approach of branch and price.

In order to generate the initial column set, the jobs are sorted in order of shortest processing time (SPT), and all possible arcs with feasible batches composed of SPT-consecutive jobs are generated (Algorithm 2, InitCols).

The complete heuristic procedure is sketched in Algorithm 3.

figure b
figure c

2.4 Exact approach: branch and price

Given the strong relaxation from the column generation procedure, a natural step is trying to embed it in an exact algorithm—branch and price. The main issue in this step is being able to preserve the pricing problem structure at every node in the search tree. Trying to use the classical branching scheme from Foster and Ryan (1976), which leverages the partitioning constraints forcing pairs of jobs to always/never be batched together, would require handling disjunctive constraints in the pricing problem. This would make the latter a strongly NP-hard disjunctive knapsack problem, ruling out the possibility of using the dynamic programming procedure of Algorithm 1 (unless P=NP).

Still, branching can be performed on a compact formulation of the problem, building the batch sequence by scheduling one job at a time starting from the first position. The basic branching mechanism adopted here is the same as described in Uzsoy (1994) and Azizoglu and Webster (2000). Let \(S=(B_1,B_2,\dots ,B_t)\) be a partial (possibly empty) batch sequence built at the current search node, and \(\hat{N} = N \setminus (B_1 \cup B_2 \cup \dots \cup B_t)\) the set of unscheduled jobs at such node. If the node is not fathomed by bound, two types of branching can take place.

  1. (I)

    A new unscheduled job \(j\in \hat{N} \) is added to \(B_t\), provided that \(\sum _{i\in B_t}s_i+s_j\le b\).

  2. (II)

    Batch \(B_t\) is closed, and a new one \(B_{t+1}\) is started, choosing its longest job among the \(j\in \hat{N}\).

New jobs are added to the open batch in non-increasing order of processing time; hence, if a job j has been added to \(B_t\), no other job \(j'\) with \(p_{j'}>p_j\) will enter the same batch in successive branches. Also, batch \(B_t\) is closed only if it is maximal: as far jobs that can be added to \(B_t\) without exceeding the capacity b, this will prevent type II branches from the current node.

The column generation-based relaxation is solved at each node of the search tree; batches from the partial batch sequence \((B_1,B_2,\dots ,B_{t-1})\) correspond, in the relaxation, to arc-variables fixed at 1. At non-root nodes, the “open” batch \(B_t\) at the end of the partial sequence is handled by imposing constraints on the state space to be searched in the pricing step. The following can be observed:

  • Only items corresponding to jobs in \(j\in \hat{N}\) concur to form the state space.

  • For pricing arcs (ikB) with \(i=\sum _{\kappa =1}^{t-1}|B_{\kappa }|\), \(B\supseteq B_t\)—these are arcs extending the “open” batch at the tail of the partial sequence—the items in \(B_t\) are preloaded in the knapsack; hence, only states \(g_r(\tau ,\ell )\) with capacity \(\tau \le (b-\sum _{i\in B_t}s_i)\) and index \(r>\max \{i:i\in B_t\}\) are solved.

  • For all other arcs, a pricing with full capacity b is performed.

The nodes in the search tree are expanded in depth-first order. Feasible solutions are generated at the root node by running the price and branch procedure in Algorithm 3, and at the leaves of the search tree when the batch sequence is completed. Although Algorithm 3 implies running a branch and bound itself, it is very fast for the tested problem sizes and allows us to achieve the highest-quality feasible solutions. Running quick and dirty heuristics at the intermediate nodes did not significantly improve the performance during preliminary testing.

3 Parallel-machine models

Model (13)–(16) is readily extended to parallel-machine cases. Consider the fairly general \(Rm|{p-\text {batch}, s_j\le b}|\sum C_j\) problem with m parallel unrelated machines. Let \(p_{jh}\) be the processing time of job j on machine h. A special type of arc with empty batches is added to the graph developed for the single-machine case, using the arc set

Arcs \((i,k,B)\in A\) are given machine-dependent costs \(c_{ikB}^h=p_{Bh}(n-i+1)\), with \(p_{Bh}=\max \{p_{jh}:j\in B\}\). Empty arcs \((1,k,\emptyset )\) are given costs \(c_{1k\emptyset }^h=0\), \(k=2,\dots ,n+1\), \(h=1,\dots ,m\).

Remark Now Eq. (10) holds, in G, only for paths of non-empty arcs connecting nodes from \(i_1\) to \(k_r\), i.e., for

$$\begin{aligned}&\!\!\!P=[(i_1,k_1,B_1),(i_2,k_2,B_2),\dots ,(i_r,k_r,B_r)]\\&\!\!\!\text {with}\, B_1\ne 0 \end{aligned}$$

\(\sum _{l=1}^r|B_l|=k_r-i_1\) holds. Note that at most the first arc in a path can be empty (if \(i_1=1\) and \(B_1=\emptyset \)).

Empty arcs are all added to the restricted master problem from the beginning, so they do not need to be considered in the dynamic programming pricing procedure. A feasible solution is composed of m batch sequences

$$\begin{aligned} S_h=(B_1^h,\dots ,B_{t_h}^h),\qquad h=1,\dots ,m \end{aligned}$$
(21)

processed by the m machines. Such batch sequences correspond to m paths (one path for each machine) from node 1 to \(n+1\) on the non-empty arcs of which the set of jobs is exactly partitioned. Such paths will have an empty arc as first arc. Note that if (ikB) is on the hth path, this means that \(n-i+1\) jobs will be scheduled from B to the end of the hth batch sequence.

Property 1 can be easily extended to the multi-machine case. Figure 3 shows a sketch of the proof with \(m=2\). The empty arcs act as placeholders.

Fig. 3
figure 3

Batch sequences on two machines as a collection of two paths on a graph. The job set N is partitioned into \(B_1,B_2,B_3,B_4\). The two paths provide a partition into batches and sequencing information

The idea is formalized as follows.

Property 3

Each feasible set of batch sequences \(\{S_h\}_{h=1}^m\) corresponds to a set of paths \(\{P_h\}_{h=1}^m\) such that the job set N is partitioned over the non-empty arcs of \(\{P_h\}_{h=1}^m\), and \(\sum _{j\in N}C_j(S_1,\dots ,S_m)=\sum _{h=1}^m\sum _{(i,k,B)\in P_h}c_{ikB}^h\).

Proof

A one-to-one mapping between feasible solutions \(\{S_h\}_{h=1}^m\) and collections of paths \(\{P_h\}_{h=1}^m\) is quickly established.

Suppose that a feasible collection of batch sequences \(S_1,\dots ,S_m\) is given. Take a machine h and its batch sequence \(S_h=(B_1,\dots ,B_{t})\)—the dependence on h is dropped from the batch notation in order to keep it simple. The batch sequence \(S_h\) can be empty (i.e., the solution leaves machine h idle) or not.

If \(S_h=\emptyset \), define \(P_h=[(1,n+1,\emptyset )]\) with a single empty arc.

If \(S_h\ne \emptyset \), by definition of the arc set A, the following arcs belong to the graph G:

$$\begin{aligned} \begin{aligned} (i_t,k_t,B_t)&\qquad k_t=n+1,&i_t=k_t-|B_t|\\ (i_\ell ,k_\ell ,B_\ell )&\qquad k_\ell =i_{\ell +1},&i_\ell =k_\ell -|B_\ell |&\qquad \ell =t-1,\dots ,1\\ (i_0,k_0,\emptyset )&\qquad k_0=i_1,&i_0=1&\qquad \text {if}\, i_1>1. \end{aligned} \end{aligned}$$

If \(i_1=1\), arc \((i_0,k_0,\emptyset )\) is omitted. The arcs are chosen so that \(k_\ell =i_{\ell +1}\), \(k_t=n+1\), and the first arc has a tail in node 1.

In both cases addressed above, \(P_h\) identifies a path from node 1 to node \(n+1\). Repeat the construction for each machine \(h=1,\dots ,m\) to obtain the collection of paths \(P_1,\dots ,P_m\); the partition of the job set over the non-empty chosen arcs is guaranteed by the fact that the batches in \(\{S_h\}_{h=1}^m\) already form a partition of N by hypothesis.

On the other hand, given a collection of paths \(P_1,\dots ,P_m\), all connecting node 1 to node \(n+1\), with the job set N partitioned over the non-empty arcs of such collection, take a machine index h, and let \(P_h=[(i_0,k_0,B_0),(i_1,k_1,B_1),\dots ,(i_t,k_t,B_t)]\). The batch sequence \(S_h\) is defined by \(S_h=(B_1,\dots ,B_t)\) if \(B_0=\emptyset \); otherwise \(S_h=(B_0,\dots ,B_t)\). Note that, by definition of the arc set A, at most \(B_0\) can be empty. If \(P_h=[(1,n+1,\emptyset )]\), \(S_h=\emptyset \), and machine h is left idle. Repeat for each machine \(h=1,\dots ,m\) in order to obtain a batch sequence for each machine. The feasibility of \(S_1,\dots ,S_m\) is guaranteed by the fact that by hypothesis, the job set N is partitioned over the set of non-empty arcs of \(P_1,\dots ,P_m\).

It remains to be proven that \(\sum _{j=1}^nC_j(S_1,\dots ,S_m)=\sum _{h=1}^m\sum _{(i,k,B)\in P_h}c_{ikB}^h\). To this end, it is sufficient to prove that on each machine h,

$$\begin{aligned} \sum _{B\in S_h}\sum _{j\in B} C_j= \sum _{(ikB)\in P_h}c_{ikB} \end{aligned}$$

then, \(\sum _{j\in N}C_j=\sum _{h=1}^m\sum _{B\in S_h}\sum _{j\in B}C_j\), and the result follows. Consider machine h and sequence \(S_h\) with the corresponding path \(P_h=[(i_0,k_0,B_0),(i_1,k_1,B_1),\dots ,(i_t,k_t,B_t)]\), with \(i_0=1\), \(k_t=n+1\). For the first arc \((i_0,k_0,B_0)\), either \(B_0=\emptyset \) or \(B_0\ne \emptyset \).

Assume \(B_0=\emptyset \). Let \(p_{B_\ell ,h}\) be the processing time of batch \(B_\ell \) on machine h. The algebraic manipulations proceed similarly to the proof of Property 1.

where the last sum can be extended to all the arcs in \(P_h\), since for the first arc \((1,k_0,\emptyset )\in P_h\), \(c_{1,i_1,\emptyset }^h=0\).

If \(B_0\ne \emptyset \), then, by Eq. (10), \(\sum _{\ell =0}^n|B_\ell |=k_t-i_0=n\); hence, all the jobs of the problem are scheduled on machine h while the other machines must be left idle. The analysis is reduced to a single-machine case, and Property 1 ensures that \(\sum _{B\in S_h}\sum _{j\in B}C_j=\sum _{(i,k,B)\in P_h}c_{ikB}^h\). \(\square \)

Model (13)–(16) can be extended to the parallel-machine case using multi-commodity flow constraints.

$$\begin{aligned} \text {minimize}\quad&\sum _{h=1}^m\sum _{(i,k,B)\in A} c_{ikB}^hx_{ikB}^h \end{aligned}$$
(22)
$$\begin{aligned} \hbox {subject to}&\mathop {\mathop {\sum }\limits _{(k,B):}}\limits _{(i,k,B)\in A}x_{ikB}^h -\mathop {\mathop {\sum }\limits _{(k,B):}}\limits _{(k,i,B)\in A}x_{kiB}^h\nonumber \\&\quad = {\left\{ \begin{array}{ll} 1&{}i=1\\ 0&{}i=2,\dots ,n\\ -1&{}i=n+1 \end{array}\right. }\quad \begin{aligned} h=&1,\dots ,m \end{aligned} \end{aligned}$$
(23)
$$\begin{aligned}&\sum _{h=1}^m\sum _{(i,k,B)\in A} \varvec{a_B} x_{ikB}^h=\mathbf {1} \end{aligned}$$
(24)
$$\begin{aligned}&x_{ikB}^h \in \left\{ 0, 1 \right\} \qquad (i,k,B)\in A,\,h=1,\dots ,m \end{aligned}$$
(25)

Here, \(x_{ikB}^h=1\) if batch B is on the hth path. Flow conservation constraints (23) require that one unit of each commodity is routed from node 1 to node \(n+1\). Constraints (24) enforce the exact partitioning of the whole job set across the arcs belonging to the m paths.

In the column generation framework, with each restricted master optimum, constraint multipliers are computed:

$$\begin{aligned}&u_1^h,u_2^h,\dots ,u_{n+1}^h&\text {for constraints}~(23), h=1,\dots ,m,\\&v_1,v_2,\dots ,v_n&\text {for constraints}~(24). \end{aligned}$$

The reduced cost is then minimized separately for each combination of pairs of indices ik with \(i<k\) and machine h, searching for arcs (ikB) with reduced costs

$$\begin{aligned} \bar{c}_{ikB^*}^h= & {} \min _B\Bigg \{p_{Bh}(n-i+1)-\sum _{j\in B}v_j:\\&\quad \sum _{j\in B} s_j\le b,\,|B|=k-i\Bigg \}-(u_i^h-u_k^h). \end{aligned}$$

This requires calling NewCols m times, once per machine, since the LPT ordering on each machine is different, and thus so is the state space \(g_r(\tau ,\ell )\). Hence, the running time for pricing increases to \(\mathcal O(m\max (n^3,n^2b))\).

A somewhat better situation arises in the case of identical parallel machines, with problem \(Pm|{p-\text {batch}, s_j\le b}|\sum C_j\). Since each job j has the same processing time \(p_j\) on every machine, the state space \(g_r(\tau ,\ell )\) used for pricing is common to all the machines, and a slightly modified version of NewCols can do the entire pricing, still keeping the running time within \(\mathcal O(\max (n^3,n^2b))\). The procedure is reported in Algorithm 4. The key observation is that the \(p_B\) and \(\sum _{j\in B}v_j\) components of the reduced costs \(c_{ikB}^h\) are machine-independent, and only the largest difference \(\Delta u_{ik}=\max _h\{(u_i^h-u_k^h)\}\) is strictly needed in order to compute the minimum reduced costs. Such largest differences are precomputed in time \(\mathcal O(mn^2)\) on line 3. For any (ikB), let r be the smallest index such that \(p_r=p_B\), and let \(B_r=B_r(b,k-i)\); then, similarly to what was proved in Property 2:

$$\begin{aligned} \begin{aligned} \bar{c}_{ikB}^h=&(n-i+1)p_B-\sum _{j\in B}v_j-(u_i^h-u_k^h)\ge \\ \ge&(n-i+1)p_{B_r}-g_r(b,k-i)-(u_i^h-u_k^h)\ge \\ \ge&(n-i+1)p_{B_r}-g_r(b,k-i)-\Delta u_{ik}. \end{aligned} \end{aligned}$$
figure d

Finally, note that also taking into account different capacities for each machine, or even different job sizes on each machine, simply requires one to specify the knapsack family used. Details are omitted for the sake of concision.

4 Computational results

The proposed algorithms discussed in the previous sections have been tested on randomly generated instances. For generating job data, the same approach as in Uzsoy (1994) and Rafiee Parsa et al. (2016) has been used. Specifically, all job processing times are drawn from a uniform distribution \(p_j \in \left[ 1, 100\right] \), while job sizes \(s_j\) are drawn from four possible uniform distributions, labeled by \(\sigma \in \{\sigma _1,\sigma _2,\sigma _3,\sigma _4\}\):

$$\begin{aligned} \begin{aligned} \sigma _1:&s_j\in \left[ 1, 10 \right]&\qquad \sigma _3:&s_j\in \left[ 3, 10 \right] \\ \sigma _2:&s_j\in \left[ 2, 8 \right]&\qquad \sigma _4:&s_j \in \left[ 1, 5 \right] . \end{aligned} \end{aligned}$$

In both Uzsoy (1994) and Rafiee Parsa et al. (2016), the machine capacity is fixed at \(b=10\).

Since the pricing procedure has a pseudo-polynomial running time, instances with \(b=30\) and \(b=50\) have also been generated in order to assess how the procedure behaves with a larger capacity. Single-machine instances have been generated with \(n\in \{20,40,60,80,100\}\) and with all four \(\sigma \) size distributions. For each n, \(\sigma \) and b combinations, 10 random instances have been generated.

With the same job data, the corresponding instances of the parallel-machine problem \(Pm|{p-\text {batch}, s_j\le b}|\sum C_j\) have been solved for \(m=2,3,5\) identical machines.

For testing the heuristics in the parallel-machine case and the branch and price exact approach for the single machine, only \(b=10\) instances have been used.

All the tests were run in a Linux environment with an Intel Core i7-6500U CPU @ 2.50GHz processor; C++ language was used for coding the algorithms, and CPLEX 12.8, called directly from C++ environment using CPLEX callable libraries, was used to solve relaxed and mixed-integer programs. The source code and instances are available.Footnote 1

Results for the price and branch heuristic in both the single-machine and parallel-machine cases are discussed in Sect. 4.1, whereas Sect. 4.2 deals with the results of the exact branch and price procedure in the single-machine case.

4.1 Evaluation of the heuristic algorithms

Both the column generation-based lower bound \(\mathtt{CG-LB}\) and the objective value of the heuristic solution \(\mathtt{CG-UB}\) have been evaluated. As far as the quality of the lower bound is concerned, the continuous relaxation of model (1)–(9) is not a realistic competitor, zero being the typical value found by CPLEX at the root branching node. A more meaningful comparison can be performed against the combinatorial lower bound proposed by Uzsoy (1994). Such bound is based on a relaxation of \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) to a preemptive problem on b parallel machines (refer to Uzsoy 1994 for details). This lower bound is referred to as PR in the following.

As far as the evaluation of \(\mathtt{CG-UB}\) is concerned, it was difficult to compare the obtained results with the known literature, as neither the test instances nor the computer codes used by Uzsoy (1994) and Rafiee Parsa et al. (2016) have been made available. Hence, some comparisons have been made with the results of Rafiee Parsa et al. (2016), using instances of the same type,, but for this reason, the comparison has to be taken with some care. On the other hand, when CPLEX is fed with model (1)–(9) and given some time, its internal heuristics do generate a number of heuristic solutions, although it has no chance of certifying optimality. Hence, CPLEX has been run on some set of instances in order to obtain heuristic solutions with a time limit of 300 s.

The times required to compute \(\mathtt{CG-LB}\) and \(\mathtt{CG-UB}\) are reported separately. The gap between \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) is evaluated as

$$\begin{aligned} gap = \frac{\mathtt{CG-UB}- \mathtt{CG-LB}}{\mathtt{CG-UB}} \cdot 100 \%. \end{aligned}$$

4.1.1 Single machine

Tables 1, 2 and 3 show the results over an increasing number of jobs with batch capacity \(b=10\), 30 and 50, respectively; the \(\mathtt{CG-UB}\) was computed using CPLEX with a time limit of 60 s. Values are shown as the average over each 10-instance group for the time, and as an average, maximum (worse) and minimum (best) over each 10-instance group for the gap. Column opt reports the number of instances (out of 10) in which the solution can be certified to be the optimum, i.e., in which \(\mathtt{CG-UB}=\mathtt{CG-LB}\). The comparison between the \(\mathtt{CG-LB}\) value and Uzsoy’s PR lower bound is also reported, computing the average, maximum and minimum over each 10-instance group of the ratio \(\mathtt{CG-LB}/ \texttt {PR}\).

Table 1 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 10\)

In Table 1, it can be seen that with \(b=10\), the computation of \(\mathtt{CG-LB}\) is fast, with average CPU time less than 1 s in almost all cases (i.e., with any number of jobs). The \(\sigma _4\) instances are the most time-demanding, with the only average computation time above 1 s. This is because a larger set of columns is usually generated on such instances. The computation of \(\mathtt{CG-UB}\) is, as expected, the heaviest part of the procedure, with greater CPU time. However, the CPLEX time limit is reached only in the cases \(n=80,100\) and \(\sigma =\sigma _4\). Again, \(\sigma _4\) instances were the most demanding of CPU time, because of the larger set of columns to be handled. The certified solution quality was very good, with an average optimality gap usually below \(1.5\%\), and only one case (\(n=80\), \(\sigma =\sigma _4\)) above \(5\%\).

From Table 1, it can be easily seen that \(\mathtt{CG-LB}\) performance is much better than PR in every combination, ranging from an average \(9 \%\) gain when \(n = 100\) and \(\sigma = \sigma _4\) to an average \(29 \%\) when \(n = 20\) and \(\sigma = \sigma _4\). These values also suggest that \(\texttt {PR}\) performs better for large n; in fact, when a high number of batches are required in the feasible solutions, the usually weak parallel-machine relaxation of PR becomes slightly stronger.

Table 2 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 30\)

From Table 2, it can be seen that the CPU time for \(\mathtt{CG-LB}\) increases; this is expected, since a larger number of possible batches are generated with an increase in capacity. The larger reduced master problems obviously also affect the computation of \(\mathtt{CG-UB}\), which reaches the time limit in all cases for \(n=80,100\). The average optimality gaps worsen, but the largest increase is not found on \(\sigma _4\) instances; instead, it affects \(\sigma _1\) instances more heavily, especially for large n.

Overall, increasing the capacity also increases the distance between the two lower bounds \(\mathtt{CG-LB}\) and PR; \(\mathtt{CG-LB}\) performs better in every combination, ranging from an average \(10 \%\) gain when \(b = 30\), \(n = 100\) and \(\sigma = \sigma _3\), to an average of \(81 \%\) when \(b = 30\), \(n = 20\) and \(\sigma = \sigma _4\). This is reasonable, since PR is based on a preemptive relaxation to b parallel machines, and allowing the splitting of jobs on more machines weakens the relaxation.

Table 3 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 50\)

Table 3 shows the results of the tests with capacity \(b=50\) that confirm the impact of b. The instances belonging to class \(\sigma _4\) are still the most computationally demanding, both for lower bounding and heuristic solution. Instances with \(\sigma =\sigma _1\) are the worst in terms of solution quality—with the exception of small 20-job instances—but, curiously, the gap decreases on \(n=80\) instances when passing from \(b=30\) to \(b=50\). The worst average gap is reached with \(n=100\) and \(\sigma =\sigma _1\) (\(15.66\%)\). Also, PR worsens considerably with respect to \(\mathtt{CG-LB}\).

Table 4 Comparison between HMMAS (values from Rafiee Parsa et al. 2016) and \(\mathtt{CG-UB}\) algorithms
Table 5 Comparison between CPLEX-UB (300 s) and \(\mathtt{CG-UB}\)

Rafiee Parsa et al. (2016) provide a hybrid max-min ant system (HMMAS) that is, to the writers’ knowledge, a state-of-the-art heuristic. A very recent paper from the same authors has been published on the same problem (Rafiee Parsa et al. 2019) where a new approach, a hybrid neural network (HNN), is proposed. In this new paper, a comparison with HMMAS is presented and, as in the previous case, only capacity \(b=10\) is considered; a statistical analysis is also presented. The quality of the results with the new procedure HNN is not better than with HMMAS; on the contrary, the average quality appears to be slightly worse. Neither the source code nor the tested instances appear to be currently available even though this new paper was recently published; hence, an attempt to compare \(\mathtt{CG-UB}\) with HMMAS was made by testing \(\mathtt{CG-UB}\) on generated instances of the same type and size as those used in Rafiee Parsa et al. (2016). Moreover, as in Rafiee Parsa et al. (2016), only the capacity \(b=10\) was investigated; the comparison performed was limited to such values. The reader being warned of the difficulty of such comparison, Table 4 points to the following situation: The results show that the performance of \(\mathtt{CG-UB}\), evaluated against Uzsoy’s lower bound PR, seems to be very similar to that of HMMAS. Thus, it can be speculated that the two algorithms could give similar results for the upper bound when they are run on the same instance set. The availability of \(\mathtt{CG-LB}\) allows us to confirm a narrower optimality gap for \(\mathtt{CG-UB}\).

Table 6 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 10\) and 2 parallel machines
Table 7 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 10\) and 3 parallel machines
Table 8 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 10\) and 5 parallel machines
Table 9 Comparison of exact approaches

The quality of \(\mathtt{CG-UB}\) has been compared to the quality of the heuristic solution reached by CPLEX (CPLEX-UB) after 300 s of computation using model (1)–(9). The CPLEX optimality gap is generally well above \(90\%\) because the lower bound is zero or almost zero. Regardless, using the proposed stronger lower bound, a more realistic optimality gap can be computed for CPLEX as

$$\begin{aligned}&\frac{{\texttt {CPLEX-UB}}-\mathtt{CG-LB}}{{\mathtt{UB}}^*}\cdot 100\%,\\&\qquad \hbox {with }\,{\mathtt{UB}^*}=\min \{{\texttt {CPLEX-UB}},\mathtt{CG-UB}\}. \end{aligned}$$

The gap for \(\mathtt{CG-UB}\) is recomputed as \((\mathtt{CG-UB}-\mathtt{CG-LB})/\mathtt{UB^*}\cdot 100\%\) for uniformity.

The comparison is reported in Table 5, in terms of average, worst and best gap, as in the previous comparison. Column \(\#\text {win}\) counts the number of instances (out of 10) for which each algorithm achieves the best solution. In the case of a draw, a “win” is counted for both, so the two columns can sum to more than 10. Instances with \(n=20\), 40, 60, 80 and \(b=10\) have been tested. CPLEX ran for the full 300 s on all instances, without proving optimality for any of them. \(\mathtt{CG-UB}\) ran with the same 60 s time limit as in Table 1. Basically, except for the small \(n=20\) instances, the CPLEX solution is consistently worse than \(\mathtt{CG-UB}\).

4.1.2 Parallel machines

With the same data, the \(Pm|{p-\text {batch}, s_j\le b}|\sum C_j\) problem has been solved with \(m=2,3\) and 5 machines. The tests have been limited to the case \(b=10\). The time limit for the branch and bound phase of the heuristic was raised to 180 s. The results are reported in Tables  6, 7 and 8. Apparently, increasing the number of machines has a very mild impact on the CPU time needed for computing the lower bound. The growth of the computational cost is much higher for the branch and bound phase, but with a certain variability on the four classes of instances, with classes \(\sigma _1\) and \(\sigma _4\) exhibiting the largest growth. Again, class \(\sigma _4\) broke the time limit in all the instances. The quality of the solution, as measured by the percentage gap, does not suffer seriously, except for the case \(n=100\), \(m=5\), \(\sigma = \sigma _4\). The worst average gap of \(14.09\%\) is caused by a single instance with a very large gap of \(81.62\%\); if a larger but still acceptable time limit of 300 s is allowed, the average gap for this class decreases to \(4.63\%\) (max gap \(12.56\%\)).

Uzsoy’s bound \(\texttt {PR}\) is easily extended to the parallel-machine case, allowing a relaxation to mb parallel machines. Tables 6, 7 and 8 also compare \(\mathtt{CG-LB}\) with PR extended to the parallel-machine case. The ratio between the two bounds is apparently unaffected by the increase in m.

4.2 Evaluation of the branch and price approach

The branch and price exact algorithm has been tested on the single-machine \(b=10\) generated instances. The reference algorithm for the exact approaches on \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) is the branch and bound of Azizoglu and Webster (2000), which was developed for the weighted version \(1|{p-\text {batch}, s_j\le b}|\sum w_jC_j\), but the authors also reported results for the unweighted version. In the latter case, the lower bound of Azizoglu and Webster reduces to Uzsoy’s bound. The branch and price procedure presented in Sect. 2.4 is based on the same branching scheme as Azizoglu and Webster (2000), with the same dominance conditions. The comparison considered here is between:

  • The branch and price equipped with the proposed lower bound \(\mathtt{CG-LB}\) obtained by column generation and the \(\mathtt{CG-UB}\) heuristic for the root node upper bound—see Sect. 2.4.

  • The branch and bound based on the same branching scheme, with the same dominance conditions, equipped with:

    • an evaluation of two different lower bounds of increasing complexity as described in Azizoglu and Webster (2000) Sect. 3, using the best one for the actual lower bound;

    • the procedure described in Azizoglu and Webster (2000) Sect. 2.4 for the root node upper bound and evaluated at every non-leaf node to further improve the quality of the upper bound.

  • The node exploration policy in both algorithms was kept depth-first.

The results are reported in Table 9 for instances with up to 40 jobs. Cumulative results for all 40 instances are reported by number of jobs, across job size distributions. Specifically, for times, nodes and gap, the average value is reported, while for the number of found optima, the total sum over the 40 instances is reported.

Table 10 Comparison (\(\texttt {GAP}=100\cdot (\texttt {UB}-\texttt {OPT})/\texttt {OPT}\)) between \(\mathtt{CG-UB}\) and real optima with \(b = 10\)
Table 11 Results for \(\mathtt{CG-UB}\) and \(\mathtt{CG-LB}\) with \(b = 50\) and \(\sigma = \sigma _5\) considering \(m = \{1, 2, 3, 5\}\) parallel machines

Both exact methods were run with a 900 s time limit. The table compares the CPU time and the number of open nodes. Column Opt counts the number of certified optima obtained within the time limit. On the left part of the table, the results for the branch and price equipped with \(\mathtt{CG-LB}\) and \(\mathtt{CG-UB}\) are reported; the right side shows the results for the branch and bound equipped with a lower bound and the heuristic by Azizoglu and Webster.

By allowing the branch and price to run without a time limit, all optimal values of the 40-job instances were also collected, even though a few instances required several hours of computation; hence, the GAP columns report the percentage gap \(100\cdot (\texttt {UB}-\texttt {OPT})/\texttt {OPT}\) between the best upper bound \(\texttt {UB}\) obtained within the time limit and the exact optimum \(\texttt {OPT}\) for both algorithms.

The branch and bound has impressively fast node processing, since Uzsoy’s bound has a very cheap running time. As the number of jobs increases, the more accurate lower bound obtained by column generation allows the branch and price to outperform the competitor. The latter algorithm is able to optimally solve all instances with up to 30 jobs and more than half of the 40-job instances, with optimality gaps mostly below \(1\%\).

5 Final remarks

In this paper, column generation techniques for solving the \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) problem have been explored, generalizing such techniques to problems with parallel machines. The exponential size model (13)–(16), handled by means of column generation, allows one to find—to the authors’ knowledge—the tightest known lower bound for \(1|{p-\text {batch}, s_j\le b}|\sum C_j\). Embedded in a simple price and branch approach, it achieves high-quality solutions for instances up to 100 jobs in size, with certified optimality gaps. Thus, it can be claimed that model (13)–(16) is strong: its relaxation gives a sharp bound, and the columns generated can be effectively composed into high-quality feasible solutions. The comparison with state-of-the art (meta)heuristics like HMMAS is admittedly problematic because of the lack of available code and instances, but the price and branch heuristic is, in the authors’ view, at least as accurate as the state-of-the-art heuristics, and faster and simpler,, since it mostly relies on a LP/MILP solver, with the addition of some ad hoc code. Embedded in an exact algorithm, the new lower bound allows us to extend the size of solvable \(1|{p-\text {batch}, s_j\le b}|\sum C_j\) instances towards 40 jobs. Having available all optimal solutions for the single-machine problems up to 40 jobs, the actual relative error of the \(\mathtt{CG-UB}\) heuristic on such instances can be seen to be even lower than the figures estimated in the experiments of Sect. 4.1—see Table 10.

Although all the random distributions for job data mentioned in the literature were used in the tests, the interested reader might be worried about the relatively narrow distributions for job sizes given in classes \(\sigma _1,\dots ,\sigma _4\). Hence, some tests were performed with capacity \(b=50\) and a new class \(\sigma _5\), with a wider distribution of \(s_j\in [1,50]\). The results of \(\mathtt{CG-UB}\) on such instances for the single- and parallel-machine cases as well are summarized in Table 11, showing that \(\mathtt{CG-UB}\) still handles such instances within practical time limits, guaranteeing narrow optimality gaps.

The new model relies on Property 1 in order to express the linear objective function by means of “positional” coefficients. Property 2 is crucial for developing an efficient pricing procedure. The approach is flexible enough to be extended to problems with parallel machines with very limited effort, while it is probably not simple to extend it to weighted \(\sum w_jC_j\) objectives.