1 Introduction

A birth-and-death process (BDP) is a stochastic model that is commonly employed for describing changes over time of the size of a population. Its first mathematical formulation is due to Feller [11], followed by important mathematical contributions of Arley and Borchsenius [1] and Kendall [18, 19]. According to the basic assumptions of the model, when the population size at time \(t\) is \(j\), the probability of a single birth occurring during an infinitesimal time interval \((t, t + dt)\) is equal to \(\lambda _{j} dt + o(dt)\) while the probability of a single death is \(\mu _{j} dt + o(dt)\), where \(\lambda _{j} \ge 0\) and \(\mu _{j} \ge 0\) for \(j \ge 0\). If \(p_{j}(t)\) is the probability of observing \(j\) individuals at time \(t\) then

$$\begin{aligned} p_{j}(t + dt) = \lambda _{j - 1} dt p_{j - 1}(t) + \mu _{j + 1} dt p_{j + 1}(t) + (1 - (\lambda _{j} + \mu _{j}) dt) p_{j}(t) + o(dt) \end{aligned}$$

If we subtract \(p_{j}(t)\) from both sides of the equation, divide by \(dt\), and then take the limit of \(dt\) to zero, we obtain the well known BDP differential equation

$$\begin{aligned} p_{j}^{\prime }(t) = \lambda _{j - 1} p_{j - 1}(t) + \mu _{j + 1} p_{j + 1}(t) - (\lambda _{j} + \mu _{j}) p_{j}(t) \end{aligned}$$
(1.1)

By assuming that at time zero the size of the population was \(i \ge 0\), that is \(p_{i}(0) = 1\), we have the initial condition required to solve the differential equation (1.1).

In this article we will focus on the simple (linear) BDP without migration [19] defined by \(\lambda _{j} = j\lambda \) and \(\mu _{j} = j\mu \). With this particular choice of parameters a starting size of zero implies \(\lambda _{0} = \mu _{0} = 0\), i.e. it remains zero for all \(t \ge 0\). The rate of growth does not increase faster than the population size and therefore \(\sum _{j = 0}^{\infty } p_{j}(t) = 1\) [12, Chapter 17, Section 4]. When \(i > 0\) the population becomes extinct if it reaches the size \(j = 0\) at time \(t > 0\). Obviously \(i\), \(j\), and \(t\) are not allowed to be negative, nor the basic birth and death rates.

What makes this model particularly attractive for real applications is the fact that its transition probability is available in closed form [2, Chapter 8] and we could, in principle, easily evaluate it. By defining

$$\begin{aligned} \begin{aligned}&\phi (t, \lambda , \mu ) = \frac{e^{(\lambda - \mu ) t} - 1}{\lambda e^{(\lambda - \mu ) t} - \mu }, \quad \alpha (t, \lambda , \mu ) = \mu \phi (t, \lambda , \mu ), \quad \beta (t, \lambda , \mu ) = \lambda \phi (t, \lambda , \mu )\\&\gamma (t, \lambda , \mu ) = 1 - \alpha (t, \lambda , \mu ) - \beta (t, \lambda , \mu ) = 1 - (\lambda + \mu ) \phi (t, \lambda , \mu ) \end{aligned} \end{aligned}$$

and assuming that at time 0 the size of the population was \(i > 0\), the probability \(p_{j}(t)\) of observing \(j\) units at time \(t\) is

figure a

where \(t\), \(\lambda \), and \(\mu \) are to be considered strictly positive if not otherwise specified and \(m = \min (i, j)\). The probability of the population being extinct at time \(t\) is

figure b

In the majority of applications direct evaluation of Eqs. (1.2)–(1.12) is sufficient. However, for particular values of process parameters, Eqs. (1.2) and (1.3) are numerically unstable (Fig. 1) and alternative methods are needed [28].

Fig. 1
figure 1

Numerical relative error of the log-probability as evaluated by direct application of Eqs. (1.2) and (1.3). For this particular example we set \(i = 25\), \(j = 35\), \(t = 2\), \(\lambda = 1\) and evaluated the log-probability as a function of \(\mu \). We computed correct values with Maple™ 2018.2 [21] using 100 significant digits. Relative error is defined as \(|1 - \log \hat{p}_{j}(t) / \log p_{j}(t)|\) where \(\hat{p}_{j}(t)\) is the numerically evaluated transition probability

A possible approach might be the algorithm introduced by Crawford and Suchard [5] based on the continued fraction representation of Murphy and O’Donohoe [22], or the saddlepoint approximation method of Davison et al. [8]. For this particular case where we know the exact closed form we will show that a simpler and more efficient method is available.

The remainder of the paper is organized as follows. In Sect. 2 we introduce the problem and find the parameter sets for which it is ill-conditioned. In Sect. 3 we rewrite the transition probability in terms of a Gaussian hypergeometric function and find in Sect. 4 a three-term recurrence relation (TTRR) for its computation. In Sect. 5 we extend the results to the log-likelihood function, its gradient, and its Hessian matrix. In Sect. 6 we apply our method to both simulated and real data. In Sect. 7 we conclude the article with a brief discussion.

2 Numerical stability

We will always assume that all basic mathematical operations (arithmetic, logarithmic, exponentiation, etc.) are computed with a relative error bounded by a value \(\epsilon \) that is close to zero and small enough for any practical application [25]. Following this assumption and after taking into consideration floating-point arithmetic [15], Eqs. (1.4)–(1.12) can be considered numerically stable and won’t be discussed further. We will instead focus our attention on the series (1.2) and (1.3) assuming all variables to be strictly positive, including \(j\).

Suppose to be interested in the value \(s_{m} = \sum _{h=0}^{m} u_{h}\) and use a naïve recursive summation algorithm for its computation, that is

$$\begin{aligned} s_{0} = u_{0}, \quad s_{n} = s_{n - 1} + u_{n}, \quad n = 1, \ldots , m \end{aligned}$$

The relative condition number of this algorithm is [27]

$$\begin{aligned} \rho _{m} = \rho _{m}^{A} + \rho _{m}^{R} = \frac{\sum \nolimits _{h=0}^{m} | u_{h} |}{\left| \sum \nolimits _{h=0}^{m} u_{h}\right| } + \frac{\sum \nolimits _{n=1}^{m} | s_{n} |}{\left| \sum \nolimits _{h=0}^{m} u_{h}\right| } \end{aligned}$$

where \(\rho _{m}^{A}\) is associated with perturbations in the the value of the addends while \(\rho _{m}^{R}\) is associated with rounding errors in the arithmetic operations. Note that when \(u_{h} \ge 0\) for all \(h\), \(\rho _{m}^{A} = 1\) and the condition number depends only on rounding errors. With a compensated summation algorithm [16] we might significantly reduce the numerical error and evaluate accurately the sum. However, when the addends are alternating in sign, the condition number can be of large magnitude and the problem is numerically unstable even when compensating for rounding errors. In our case it is likely that the magnitude of the binomial coefficients make \(\rho _{m}^{A}\) a ratio between a very large number and a probability that is instead close to zero. We will now find the conditions under which the sums (1.2) and (1.3) are alternating in sign.

Proposition 2.1

For all \(\lambda \ge 0\) and \(\mu \ge 0\) the function

$$\begin{aligned} \phi (t, \lambda , \mu ) = \frac{e^{(\lambda - \mu ) t} - 1}{\lambda e^{(\lambda - \mu ) t} - \mu } \end{aligned}$$

is zero if and only if \(t = 0\). It is always positive otherwise.

Proof

If \(\mu \ne \lambda \) and \(t = 0\) the numerator \(e^{(\lambda - \mu ) t} - 1\) is equal to zero but the denominator is not. When \(\mu = \lambda \) the function becomes

$$\begin{aligned} \lim _{\mu \rightarrow \lambda } \frac{e^{(\lambda - \mu ) t} - 1}{\lambda e^{(\lambda - \mu ) t} - \mu } = \frac{t}{1 + \lambda t} \end{aligned}$$

For all \(\lambda \ge 0\), it is equal to zero if \(t = 0\) and positive otherwise. Assume now \(t > 0\). When \(\lambda > \mu \) we have \(e^{(\lambda - \mu ) t} > 1\) and \(\mu / \lambda < 1\) implying that the numerator and the denominator are always positive. When \(\lambda < \mu \) the numerator \(e^{(\lambda - \mu ) t} - 1\) is negative. The denominator is also negative when \(e^{(\lambda - \mu ) t} < \mu /\lambda \). Since \(\lambda < \mu \) the left hand side is less than one while the right hand side is greater than one, proving the proposition. \(\square \)

Corollary 2.1

Functions \(\alpha (t, \lambda , \mu ) = \mu \phi (t, \lambda , \mu )\) and \(\beta (t, \lambda , \mu ) = \lambda \phi (t, \lambda , \mu )\) are non-negative for all \(\lambda \ge 0\), \(\mu \ge 0\), and \(t \ge 0\).

Proof

This is a direct consequence of Proposition (2.1) and the assumptions \(\lambda \ge 0\) and \(\mu \ge 0\). \(\square \)

Proposition 2.2

Assume \(t > 0\), \(\lambda > 0\), \(\mu > 0\). If \(\mu \ne \lambda \), define \(\xi = \log (\lambda / \mu ) / (\lambda - \mu )\). If \(\mu = \lambda \), define instead \(\xi = 1 / \lambda \). Let \(\gamma (t, \lambda , \mu ) = 1 - (\lambda + \mu ) \phi (t, \lambda , \mu )\). Then

$$\begin{aligned} {(t, \lambda , \mu )} \left\{ \begin{array}{c}< 0\, \quad \text { if }\, t> \xi \\ > 0\, \quad \text { if }\, t < \xi \\ = 0\, \quad \text { if }\, t = \xi \end{array} \right. \end{aligned}$$

Proof

Rewrite the function \(\gamma (t, \lambda , \mu )\) in its expanded form:

$$\begin{aligned} \gamma (t, \lambda , \mu ) = \frac{\lambda - \mu e^{(\lambda - \mu ) t}}{\lambda e^{(\lambda - \mu ) t} - \mu } \end{aligned}$$

Assume \(\mu \ne \lambda \). If \(t > 0\), we already proved in Proposition (2.1) that the denominator is always positive when \(\lambda > \mu \) while it is always negative when \(\lambda < \mu \). The numerator is positive when \((\lambda - \mu ) t < \log (\lambda / \mu )\), that is when \(t < \xi \) and \(\lambda > \mu \) or \(t > \xi \) and \(\lambda < \mu \). It is zero when \(t = \xi \). From these results follow the first set of inequalities. When \(\mu = \lambda \) the function becomes

$$\begin{aligned} \lim _{\mu \rightarrow \lambda } \frac{\lambda - \mu e^{(\lambda - \mu ) t}}{\lambda e^{(\lambda - \mu ) t} - \mu } = \frac{1 - \lambda t}{1 + \lambda t} \end{aligned}$$

If \(t > 0\) and \(\lambda > 0\) the denominator is always positive. The numerator is zero if \(t = \lambda ^{-1}\), it is positive when \(t < \lambda ^{-1}\), and it is negative otherwise. \(\square \)

Corollary 2.2

When \(t = \log (\lambda / \mu ) / (\lambda - \mu )\) equation (1.2) becomes

$$\begin{aligned} p_{j}(t) = \left( {\begin{array}{c}i + j - 1\\ i - 1\end{array}}\right) \left( \frac{\mu }{\lambda + \mu }\right) ^{i} \left( \frac{\lambda }{\lambda + \mu }\right) ^{j} \end{aligned}$$

When \(t = \lambda ^{-1}\) Eq. (1.3) becomes

$$\begin{aligned} p_{j}(t) = \left( {\begin{array}{c}i + j - 1\\ i - 1\end{array}}\right) \left( \frac{1}{2}\right) ^{i + j} \end{aligned}$$

Proof

This is a direct consequence of the fact that \(\gamma (t, \lambda , \mu ) = 0\) and that \(0^{h}\) is zero for all \(h > 0\) and one for \(h = 0\). \(\square \)

From Corollary 2.2 we have simple closed form solutions when \(\gamma (t, \lambda , \mu )\) is zero and therefore we will not consider this case further. We know from Proposition 2.2 the conditions under which Eqs. (1.2) and (1.3) are alternating in sign and might become numerically unstable. Looking at the example shown in Fig. 1, where \(t = 2\) and \(\lambda = 1\), function \(\gamma (t, \lambda , \mu )\) is non-negative when \(0 < \mu \lesssim 0.2032\). We clearly see from Fig. 1 that the error steadily increases starting at the value \(\mu \approx 0.2032\). Computer algebra systems implementing symbolic computation are obviously a way of evaluating the transition probability since we can, in principle, increase the number of significant digits to an appropriate amount. The computational cost of symbolic computation, however, can be prohibitive for real applications where the probability is usually evaluated multiple times with a large number of addends in the summation. In the next section we will instead find an alternative representation to Eqs. (1.2) and (1.3) that will lead to an algorithm for their accurate and efficient evaluation.

3 Hypergeometric representation

Define

$$\begin{aligned}&\omega (i, j, t, \lambda , \mu ) = \left( {\begin{array}{c}i + j - 1\\ i - 1\end{array}}\right) \mu ^{i} \lambda ^{j} \left( \frac{e^{(\lambda - \mu ) t} - 1}{\lambda e^{(\lambda - \mu ) t} - \mu }\right) ^{i + j} \\&z(t, \lambda , \mu ) = \frac{\gamma (t, \lambda , \mu )}{\alpha (t, \lambda , \mu ) \beta (t, \lambda , \mu )} = \frac{(\lambda - \mu e^{(\lambda - \mu ) t}) (\lambda e^{(\lambda - \mu ) t} - \mu )}{\lambda \mu (e^{(\lambda - \mu ) t} - 1)^{2}} \end{aligned}$$

Note that \(\omega (i, j, t, \lambda , \mu )\) is simply the first term in (1.2). When \(\mu = \lambda \) set

$$\begin{aligned} \omega (i, j, t, \lambda , \lambda )&= \lim _{\mu \rightarrow \lambda } \omega (i, j, t, \lambda , \mu ) = \left( {\begin{array}{c}i + j - 1\\ i - 1\end{array}}\right) \left( \frac{\lambda t}{1 + \lambda t}\right) ^{i + j}\\ z(t, \lambda , \lambda )&= \lim _{\mu \rightarrow \lambda } z(t, \lambda , \mu ) = \left( \frac{1}{\lambda t}\right) ^{2} - 1 \end{aligned}$$

Multiply and divide each term in the series (1.2) by \(\omega (i, j, t, \lambda , \mu )\) to get

(3.1)

where \((q)_{h}\) is the rising Pochhammer symbol and \({}_{2}F_{1}(a, b; c; y)\) is the Gaussian hypergeometric function [26, Chapter 1]. To evaluate (3.1) is then sufficient to separately compute the functions \(\omega (i, j, t, \lambda , \mu )\) and \({}_{2}F_{1}(-i, -j; -(i + j - 1); -z(t, \lambda , \mu ))\).

Partial derivatives of \(\log p_{j}(t)\) are required for computing partial derivatives of the log-likelihood function as we will explicitly show in Sect. 5. The following theorem proves that partial derivatives of \({}_{2}F_{1}(-i, -j; -(i + j - 1); -z(t, \lambda , \mu ))\) depend on hypergeometric functions of similar nature.

Theorem 3.1

Denote with \(u_{x}\) the first-order partial derivative of \(u(x, y)\) with respect to \(x\). Similarly, denote with \(u_{xy}\) the second-order partial derivative with respect first to \(x\) and subsequently \(y\). Then

where \(a_{1} = i - 1\), \(a_{2} = i - 2\), \(b_{1} = j - 1\), and \(b_{2} = j - 2\).

Proof

where \(m = \min (i, j)\). Apply the same procedure to obtain the second-order partial derivatives. \(\square \)

From Theorem 3.1 we see that, in general, we must be able to accurately evaluate the hypergeometric function

(3.2)

for \(a, b \in \mathbb {N}_{+}\), \(k = 1, 0, -1, -2, \ldots \), and \(z \in \mathbb {R}\).

4 Hypergeometric function \({}_{2}F_{1}(-a, -b; -(a + b - k); -z)\)

The following theorem can be considered the main result of this article.

Theorem 4.1

The hypergeometric function \({}_{2}F_{1}(-a, -b; -(a + b - k); -z)\), as a function of \(b\), is a solution of the three-term recurrence relation (TTRR)

$$\begin{aligned}&(a + b + 1 - k) (a + b - k) y_{b + 1} - (a + b - k) (a + b + 1 - k + (a - b) z) y_{b} +\nonumber \\&\quad - b (b - k) z y_{b - 1} = 0 \end{aligned}$$
(3.1)

Proof

The recursion can be obtained by the method of creative telescoping [24, 29]. To prove that it holds, define

$$\begin{aligned} L_{b, h} = \frac{a!}{(a - h)!} \frac{b!}{(b - h)!} \frac{(a + b - k - h)!}{(a + b - k)!} \frac{z^{h}}{h!} \end{aligned}$$

and let

$$\begin{aligned} t_h= & {} (a + b + 1 - k) (a + b - k) L_{b + 1, h} \\&-\, (a + b - k) (a + b + 1 - k + (a - b) z) L_{b, h} + - b (b - k) z L_{b - 1, h} \end{aligned}$$

Note that \(\sum _{h} t_{h}\) is the left hand side of equation (3.1) because \(y_{b} = \sum _{h} L_{b, h}\). Set

$$\begin{aligned} R_{b, h} = - \frac{(a - k) (a + b - k - h) b h}{(b + 1 - h) (b - h)} \end{aligned}$$

and let

$$\begin{aligned} u_{h} = R_{b, h + 1} L_{b - 1, h + 1} - R_{b, h} L_{b - 1, h} \end{aligned}$$

Sum the previous expression with respect to \(h\) to obtain \(\sum _{h} u_{h} = - R_{b, 0} L_{b - 1, 0} = 0\). We now need to prove that \(t_{h} = u_{h}\) for all \(h\). Start by dividing \(t_{h}\) by \(L_{b - 1, h}\) to obtain

$$\begin{aligned}&\frac{(a + b + 1 - k - h) (a + b - k - h) (b + 1) b}{(b + 1 - h) (b - h)} +\\&\quad - \frac{b (a + b - k - h) (a + b + 1 - k + (a - b) z)}{b - h} - b (b - k) z \end{aligned}$$

By expanding the polynomial and collecting the terms with respect to \(h\) we get

$$\begin{aligned} \frac{t_h}{L_{b - 1, h}} = -\frac{(a - k) b}{(b + 1 - h) (b - h)} \left[ (1 + z) h^{2} - (a + b - k + (a + b + 1) z) h + a (b + 1) z \right] \end{aligned}$$

Doing the same with the right hand side we get

$$\begin{aligned} \begin{aligned} \frac{u_{h}}{L_{b - 1, h}}&= -\frac{(a - k) b}{(b + 1 - h) (b - h)} \left[ (b + 1 - h ) (a - h) z - h (a + b - k - h)\right] \\&= -\frac{(a - k) b}{(b + 1 - h) (b - h)} \left[ (1 + z) h^{2} - (a + b - k + (a + b + 1) z) h + a (b + 1) z \right] \end{aligned} \end{aligned}$$

proving the equality. \(\square \)

If we divide both sides of equation (3.1) by the coefficient of \(y_{b + 1}\), and shift the index by 1, we obtain the forward recursion

$$\begin{aligned} y_{b} = \left( 1 + \frac{(a + 1 - b) z}{a + b - k}\right) y_{b - 1} + \frac{(b - 1) (b - 1 - k) z}{(a + b - k) (a + b - 1 - k)} y_{b - 2} \end{aligned}$$
(3.2)

Starting from

$$\begin{aligned} \begin{aligned} y_{0}&= {}_{2}F_{1}(-a, 0; -(a - k); -z) = 1\\ y_{1}&= {}_{2}F_{1}(-a, -1; -(a + 1 - k); -z) = 1 + \frac{a z}{a + 1 - k} \end{aligned} \end{aligned}$$

we can, in principle, obtain all remaining solutions all the way up to the required term. Note that \(a \ge 1\) and \(k \le 1\), therefore the denominator \(a + 1 - k\) is strictly positive and always well defined. Knowing the values of \(y_{b + 2}\) and \(y_{b + 1}\), for large \(b\), we can travel the recursion also in a backward manner:

$$\begin{aligned} y_{b} = \frac{(a + b + 2 - k) (a + b + 1 - k)}{(b + 1) (b + 1 - k) z} \left( y_{b + 2} - \left( 1 + \frac{(a - b - 1) z}{a + b + 2 - k}\right) y_{b + 1}\right) \nonumber \\ \end{aligned}$$
(3.3)

Theorem 4.1 proves that our hypergeometric function is a solution of a TTRR. However, equation (3.1) can also admit a second linearly independent solution.

Definition 4.1

A solution \(f_{b}\) of a TTRR is said to be a minimal solution if there exists a linearly independent solution \(g_{b}\) such that

$$\begin{aligned} \lim _{b \rightarrow \infty } \frac{f_{b}}{g_{b}} = 0 \end{aligned}$$

The solution \(g_{b}\) is called a dominant solution.

It is well known that, regardless of the starting values, forward evaluation of a TTRR converges to the dominant solution while backward evaluation converges instead to the minimal solution [14, Chapter 4]. We now need to find the conditions under which our hypergeometric function is either the dominant or the minimal solution.

Lemma 4.1

(Poincaré–Perron) Let \(y_{b + 1} + v_{b} y_{b} + u_{b} y_{b - 1} = 0\) and suppose that \(v_{b}\) and \(u_{b}\) are different from zero for all \(b > 0\). Suppose also that \(\lim _{b \rightarrow \infty } v_{b} = v\) and \(\lim _{b \rightarrow \infty } u_{b} = u\). Denote with \(\phi _{1}\) and \(\phi _{2}\) the (not necessarily distinct) roots of the characteristic equation \(\phi ^{2} + v \phi + u = 0\). If \(f_{b}\) and \(g_{b}\) are the linearly independent non-trivial solutions of the difference equation, then

$$\begin{aligned} \limsup _{b \rightarrow \infty } \root b \of {|f_{b}|} = |\phi _{1}|, \quad \limsup _{b \rightarrow \infty } \root b \of {|g_{b}|} = |\phi _{2}| \end{aligned}$$

If \(|\phi _{1}| < |\phi _{2}|\) it is also

$$\begin{aligned} \lim _{b \rightarrow \infty } \frac{f_{b + 1}}{f_{b}} = \phi _{1}, \quad \lim _{b \rightarrow \infty } \frac{g_{b + 1}}{g_{b}} = \phi _{2} \end{aligned}$$

and \(f_{b}\) is the minimal solution while \(g_{b}\) is the dominant solution. If \(|\phi _{1}| = |\phi _{2}|\) the lemma is inconclusive about the existence of a minimal solution.

Proof

See Chapter 8 of Elaydi [10]. \(\square \)

Using Lemma 4.1 we can study the nature of our hypergeometric function as a solution of the TTRR.

Theorem 4.2

\({}_{2}F_{1}(-a, -b; -(a + b - k); -z)\) is a dominant solution of Eq. (3.1) when \(|z| < 1\). It is a minimal solution when \(|z| > 1\). The nature of the solution is unknown when \(|z| = 1\).

Proof

Our TTRR is

$$\begin{aligned} y_{b + 1} - \left( 1 + \frac{(a - b) z}{a + b + 1 - k}\right) y_{b} - \frac{b (b - k) z}{(a + b + 1 - k) (a + b - k)} y_{b - 1} = 0 \end{aligned}$$

Take the limit of the coefficients

$$\begin{aligned} \lim _{b \rightarrow \infty } - \left( 1 + \frac{a - b}{a + b + 1 - k} z\right) = - (1 - z), \quad \lim _{b \rightarrow \infty } - \frac{b (b - k) z}{(a + b + 1 - k) (a + b - k)} = -z \end{aligned}$$

The characteristic equation is \(\phi ^{2} - (1 - z) \phi - z = 0\) with solutions \(\phi _{1} = -z\) and \(\phi _{2} = 1\). When \(|z| < 1\) the solution associated with \(\phi _{1}\) is minimal and the one associated with \(\phi _{2}\) is dominant. The opposite is true when \(|z| > 1\). We will now prove that \({}_{2}F_{1}(-a, -b; -(a + b - k); -z)\) is associated with the characteristic root \(\phi _{2} = 1\). The summation index \(h\) in Eq. (3.2) depends on \(b\) since the upper bound of the series is the minimum between \(a\) and \(b\). Note, however, that variable \(a\) is considered known and fixed to a finite value. When \(b\) goes to infinity the summation index \(h\) in (3.2) does not depend on \(b\) any more and the series is always finite, so that we can safely exchange the limit of the sum with the sum of the limits:

$$\begin{aligned} \lim _{b \rightarrow \infty } \sum _{h = 0}^{\min (a, b)} \frac{(-a)_{h} (-b)_{h}}{(-(a + b - k))_{h}} \frac{(-z)^{h}}{h!} = \sum _{h = 0}^{a} (-a)_{h} \frac{(-z)^{h}}{h!} \lim _{b \rightarrow \infty } \frac{(-b)_{h}}{(-(a + b - k))_{h}} \end{aligned}$$

Using Stirling’s approximation \(n! \sim \sqrt{2 \pi n} (n / e)^{n}\) for large \(n\), we obtain

$$\begin{aligned} \begin{aligned}&\lim _{b \rightarrow \infty } \frac{(-b)_{h}}{(-(a + b - k))_{h}} = \lim _{b \rightarrow \infty } \frac{b^{b + 1/2} (a + b - k - h)^{a + b - k - h + 1/2}}{(b - h)^{b - h + 1/2} (a + b - k)^{a + b - k + 1/2}}\\&\quad = \lim _{b \rightarrow \infty } \left( \frac{b}{b - h}\right) ^{b + 1/2} \left( \frac{a + b - k - h}{a + b - k}\right) ^{b - k + 1/2} = e^{h} e^{-h} = 1 \end{aligned} \end{aligned}$$

from which follows that

and

The solution is therefore dominant when \(|z| < 1\) and minimal for \(|z| > 1\). When \(|z| = 1\) Lemma 4.1 is inconclusive about the nature of the solution. \(\square \)

Since Lemma 4.1 refers to asymptotic results, Theorem 4.2 is always valid for large values of \(b\). For small values of \(b\), instead, there is a possibility of anomalous behaviour as described by Deaño and Segura [9]. By Definition 4.1 we would expect that the sequence of ratios between a minimal and a dominant solution would be monotonically decreasing to zero for all \(b\). There are cases, however, in which this is not necessarily true. A minimal solution might behave as a dominant solution up to a finite value \(b^{*}\) and then switch to its asymptotic minimal nature only starting at \(b^{*} + 1\).

Definition 4.2

Let \(f_{b}\) and \(g_{b}\) be respectively the minimal and dominant solution of a TTRR as \(b \rightarrow \infty \). \(f_{b}\) is said to be pseudo-dominant for all \(b \le b^{*}\) if the sequence \(\{R_{b} = |f_{b} / g_{b}|\}\) is increasing for \(b \le b^{*}\) but decreasing for \(b > b^{*}\).

Lemma 4.2

(Deaño–Segura) Let \(y_{b + 1} + v_{b} y_{b} + u_{b} y_{b - 1} = 0\) be a recurrence such that, for \(b \ge b^{-}\), \(u_{b} < 0\) and \(v_{b}\) changes sign at \(b^{*} > b^{-} + 1\). Suppose that there exists a solution \(f_{b}\) with fixed pattern of signs for all \(b \ge b^{-}\), the pattern being alternating if \(v_{b} < 0\) for large \(b\) or with constant sign if \(v_{b} > 0\) for large \(b\) (\(f_{b}\) may be minimal). Let \(g_{b}\) be any solution (not minimal) such that

$$\begin{aligned} \frac{g_{b^{*} + 1}}{g_{b^{*}}} = - \psi \frac{f_{b^{*} + 1}}{f_{b^{*}}}, \qquad \psi > 0, \end{aligned}$$

and let \(R_{b} = |f_{b} / g_{b}|\), then for \(b \ge b^{-}\) the following holds depending on the value \(\psi \):

  1. (1)

    if \(\psi > 1\), then \(R_{b} < R_{b^{*}}\) if \(b \ne b^{*}\).

  2. (2)

    if \(\psi < 1\), then \(R_{b} < R_{b^{*} + 1}\) if \(b \ne b^{*} + 1\).

  3. (3)

    if \(\psi = 1\), then \(R_{b} < R_{b^{*}} = R_{b^{*} + 1}\) if \(b \ne b^{*}, b^{*} + 1\).

Proof

See Deaño and Segura [9]. \(\square \)

According to Lemma 4.2, if \(u_{b}\) is negative and \(v_{b}\) changes sign at index \(b^{*}\), then our asymptotic minimal solution behaves as a dominant solution up to \(b^{*} - 1\) or \(b^{*}\). We must then study the sign of the two coefficients

$$\begin{aligned} u_{b}= & {} - \frac{b (b - k) z}{(a + b + 1 - k) (a + b - k)} \\ v_{b}= & {} -\frac{(a + b + 1 - k) + (a - b) z}{a + b + 1 - k} \end{aligned}$$

with \(b \ge 1\), \(a \ge 1\) and \(k \le 1\). Since the denominators are strictly positive, we can simply study the signs of the associated quantities

$$\begin{aligned} u_{b}^{\prime }&= - b (b - k) z\\ v_{b}^{\prime }&= - (a + b + 1 - k) - (a - b) z \end{aligned}$$

\(u_{b}^{\prime }\) is negative when \(z > 0\), positive when \(z < 0\), and zero when \(b = k = 1\). Define \(b^{*} = (z - 1)^{-1} ((z + 1) a + 1 - k)\). \(v_{b}^{\prime }\) is negative when \(z < 1\) and \(b > b^{*}\) or when \(z > 1\) and \(b < b^{*}\). It is obviously positive in the complementary set. The point \(b^{*}\) is the delimiter at which the coefficient \(v_{b}\) switches from positive sign to negative sign or vice versa.

When \(z > 1\) we are under the conditions of Lemma 4.2, therefore the solution is surely minimal for \(b > b^{*} + 1\). It is pseudo-dominant for all \(b < b^{*}\). Not knowing the shape of the linearly independent solution \(g_{b}\), we don’t know if the solution becomes minimal at \(b^{*}\) or \(b^{*} + 1\). Interestingly, when \(z < - (a + 2 - k) / (a - 1)\), we have the opposite behaviour of a positive \(u_{b}\) and \(v_{b}\) changing sign from positive to negative at the same index \(b^{*}\). The regions are highlighted in Fig. 2.

Fig. 2
figure 2

Nature of the hypergeometric function \({}_{2}F_{1}(-a, -b; -(a + b - k); -z)\) as a solution of TTRR (3.1). The minimum value admissible for \(b\) is one. When \(b = 0\) we can simply use the numerically stable Eqs. (1.9)–(1.12). The dotted curve is given by equation \(b^{*} = (z - 1)^{-1} ((z + 1) a + 1 - k)\). It is dotted to represent the fact that we don’t know if the solution becomes minimal at \(b^{*}\) or \(b^{*} + 1\). The curve has an horizontal asymptote at \(b = a\)

Lemma 4.2 does not consider the case of a positive \(u_{b}\) but we conjecture that it might be applied to this case as well. Nevertheless, as shown by the following proposition, we can simply ignore the problem altogether.

Proposition 4.1

For all finite \(\lambda > 0\), \(\mu > 0\), and \(t > 0\), function \(z(t, \lambda , \mu )\) is always greater than -1. It is positive when \(\mu \ne \lambda \) and \(t < \log (\lambda / \mu ) / (\lambda - \mu )\) or when \(\mu = \lambda \) and \(t < \lambda ^{-1}\).

Proof

Rewrite the function \(z(t, \lambda , \mu )\) as

$$\begin{aligned} z(t, \lambda , \mu ) = \frac{\gamma (t, \lambda , \mu )}{\alpha (t, \lambda , \mu ) \beta (t, \lambda , \mu )} = \frac{(\frac{\lambda }{\mu } + \frac{\mu }{\lambda }) e^{(\lambda - \mu ) t} - e^{2 (\lambda - \mu ) t} - 1}{e^{2 (\lambda - \mu ) t} - 2 e^{(\lambda - \mu ) t} + 1} \end{aligned}$$

It is straightforward to show that the function converges to \(-1\) when \(\lambda \), \(\mu \), or \(t\) go to infinity. The limit is never attained for finite \(\lambda \), \(\mu \), or \(t\). When any of the parameters approaches zero, instead, the function approaches positive infinity. We know from Corollary 2.1 that the denominator \(\alpha (t, \lambda , \mu ) \beta (t, \lambda , \mu )\) is always positive. The sign of the function \(z(t, \lambda , \mu )\) is therefore equal to the sign of \(\gamma (t, \lambda , \mu )\), which is given in Proposition 2.2. Same results apply when \(\mu = \lambda \). \(\square \)

Note that for \(|z| > 1\), as clearly shown in Fig. 2, we have to use either the forward or backward recursion depending on the value of \(b\) that we wish to evaluate. We can simplify our computations and compute the hypergeometric function for all values of \(z \in \mathbb {R}\) by applying the well known symmetric property \({}_{2}F_{1}(-a, -b; -(a + b - k); -z) = {}_{2}F_{1}(-b, -a; -(b + a - k); -z)\). If \(z > 1\) and \(b > a\), swap the two variables to transform a minimal solution into a pseudo-dominant one. If \(z < -1\) and \(b < a\), swap the two variables to transform a (possibly) pseudo-dominant solution into a minimal one. When \(|z| = 1\) the two solutions are neither dominant nor minimal and recursion in both directions is not unstable [13].

All previous results are summarized in Algorithm 1 in Supplementary Material S1. Assuming a constant time for arithmetic operations the time complexity is simply \(O(m)\), where \(m = \min (a, b)\), that is the total number of iterations required. Note that we only use basic arithmetic operations, saving computational time when compared to the more expensive functions found in Eqs. (1.2)–(1.3), such as the Binomial function. Using the TTRR approach is better, from a computational point of view, also when the problem is well-behaved.

5 Likelihood function

Let \(\varvec{t} = (t_{0}, \ldots , t_{S})^{T}\) be the vector of observation times with \(t_{S} \le t\), \(\varvec{n} = (n_{0}, \ldots , n_{S})^{T}\) be the corresponding observed population sizes, and \(\varvec{\tau } = (\tau _{1}, \ldots , \tau _{S})^{T} = (t_{1} - t_{0}, \ldots , t_{S} - t_{S - 1})^{T}\) be the vector of inter-arrival times. When the process is observed continuously the log-likelihood function is [7, Equation (25)]

$$\begin{aligned} \log \mathscr {L}(\lambda , \mu | \varvec{t}, \varvec{n}) = B_{t} \log \lambda + D_{t} \log \mu - (\lambda + \mu ) X_{t} + \sum _{s = 0}^{S - 1} \log n_{s} \end{aligned}$$
(5.1)

where \(B_{t}\) and \(D_{t}\) are respectively the total number of births and deaths recorded during the time interval \([0, t]\) while \(X_{t} = \sum _{s = 0}^{S} n_{s} \tau _{s + 1}\) is the total time lived in the population during \([0, t]\). By convention we set \(\tau _{S + 1} = t - t_{S}\). From (5.1) we obtain the maximum likelihood estimators (MLEs) of \(\lambda \) and \(\mu \) as

$$\begin{aligned} \hat{\lambda } = \frac{B_{t}}{X_{t}}, \qquad \hat{\mu } = \frac{D_{t}}{X_{t}} \end{aligned}$$
(5.2)

from which follows that the MLE of the growth rate \(\theta = \lambda - \mu \) is \(\hat{\theta } = \hat{\lambda } - \hat{\mu } = (B_{t} - D_{t}) / X_{t}\). A more challenging situation is encountered when the BDP is observed discretely at fixed time points. Rewrite the probability of transitioning from \(i\) to \(j\) in \(t\) time with birth rate \(\lambda \) and death rate \(\mu \) as \(p(j | i, t, \lambda , \mu )\). Since the BDP is a continuous time Markov chain [19] we can write the likelihood function as

$$\begin{aligned} \mathscr {L}(\lambda , \mu | \varvec{t}, \varvec{n}) = \prod _{s = 1}^{S} p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu ) \end{aligned}$$

Note that the joint likelihood of \(M\) observations of stochastically independent processes, having the same birth and death rates, is simply the product of the \(M\) likelihoods associated with each process. To the best of our knowledge, no known closed form solutions for \(\hat{\lambda }\) and \(\hat{\mu }\) are currently available for the discrete case. However, in the case of equidistant sampling where \(\tau _{s} = \tau \) for all \(s\), we know that [17]

$$\begin{aligned} \hat{\theta } = \frac{1}{\tau }\log \left( \frac{n_{1} + \cdots + n_{S}}{n_{0} + \cdots + n_{S - 1}}\right) \end{aligned}$$
(5.3)

It is easy to show that the first moment of \(\hat{\theta }\) does not exist. Starting with \(S = 1\) we have

$$\begin{aligned} \mathbb {E}[\hat{\theta }] = \frac{1}{\tau } \sum _{j = 0}^{\infty } \log \left( \frac{j}{n_{0}}\right) p(j | n_{0}, t, \lambda , \mu ) \end{aligned}$$

The first term in the summation is not defined because the probability of extinction is strictly positive, unless the process is a pure birth process (see Eqs. (1.9)–(1.12)). For a simple birth-and-death process without migration the population stays extinct once its size reaches a value of zero, therefore the previous result can be extended to any value \(S > 1\). To estimate \(\hat{\lambda }\), \(\hat{\mu }\), and \(\hat{\theta }\) we must consider only observations in which the population is not immediately extinct at time point \(s = 1\).

To find the maximum likelihood estimators we will use a numerical approach, that is the Newton–Raphson method [4, Chapter 4] applied to the log-likelihood function. To proceed we need its gradient and Hessian matrix, that are

$$\begin{aligned}&\nabla l(\lambda , \mu | \varvec{t}, \varvec{n}) = \nabla \log \mathscr {L}(\lambda , \mu | \varvec{t}, \varvec{n}) = \sum _{s = 1}^{S} \begin{pmatrix} \dfrac{\partial }{\partial \lambda } \log p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu )\\ \dfrac{\partial }{\partial \mu } \log p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu ) \end{pmatrix} \end{aligned}$$
(5.4)
$$\begin{aligned}&\varvec{H}(\lambda , \mu | \varvec{t}, \varvec{n}) = \sum _{s = 1}^{S} \begin{pmatrix} \dfrac{\partial ^{2}}{\partial \lambda ^{2}} \log p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu ) &{} \dfrac{\partial ^{2}}{\partial \lambda \partial \mu } \log p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu )\\ \dfrac{\partial ^{2}}{\partial \mu \partial \lambda } \log p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu ) &{} \dfrac{\partial ^{2}}{\partial \mu ^{2}} \log p(n_{s} | n_{s - 1}, \tau _{s}, \lambda , \mu ) \end{pmatrix}\nonumber \\ \end{aligned}$$
(5.5)

with closed form solutions of partial derivatives of log-probabilities appearing in (5.4) and (5.5) given in Supplementary Material S2. They can be evaluated with our proposed TTRR approach. Note that (5.4) and (5.5) are sums of piecewise functions with sub-domains inherited from Eqs. (1.2)–(1.8).

6 Applications

We developed a free Julia [3] package called “SimpleBirthDeathProcess” to implement all results presented so far.Footnote 1

Returning to the example shown in Fig. 1, we can see from Fig. 3 that our method improves significantly the accuracy of the computations. Interestingly, although not entirely unexpected, the algorithm has a higher numerical error in the neighbourhood of the special point \(\mu = \lambda \), that is the removable singularity of Eq. (1.2). Note that relative errors for this particular example are always less than \(10^{-10}\) and small enough for any practical application. In Fig. 4 we can see a more general example where points near the line \(\mu = \lambda \) are again associated with higher relative errors. Also in this case they are very small and always less than \(10^{-13}\).

Fig. 3
figure 3

Numerical relative error of the log-probability evaluated using the hypergeometric representation and the TTRR approach. Parameters are the same as in Fig. 1, that is \(i = 25\), \(j = 35\), \(t = 2\), and \(\lambda = 1\). Relative error is always less than \(10^{-10}\)

Fig. 4
figure 4

Numerical relative error of the log-probability evaluated using the hypergeometric representation and the TTRR approach. Parameters for this example are \(i = 200\), \(j = 100\), and \(t = 1\). Relative error is always less than \(10^{-13}\)

6.1 Simulated data

We will now study some properties of the maximum likelihood estimator of the birth rate \(\lambda \), death rate \(\mu \), and growth rate \(\theta = \lambda - \mu \). We will use our software package to perform simulations and apply standard Monte Carlo integration to approximate the bias and root mean square error (RMSE) of MLEs. The total number of simulations is set to \(10^{5}\) in each of the following synthetic experiments.

6.1.1 Constant growth rate

The first study mimics a situation in which both rate parameters are strictly positive. For simplicity we fix the total observation time to \(t = 10\) and assume the process to be observed at \(S\) equidistant time points, that is every \(\tau = t / S\) amount of time. To reduce the amount of possible combinations to test we choose birth and death rates so that the expected population size and standard deviation at time \(t\) is approximately proportional to the initial population size. In what follows we will always condition our estimators only to populations that are not immediately extinct, as explained in Sect. 5. Results of the simulations when \(\lambda > \mu \) are shown in Table 1 while results of the simulations when \(\lambda < \mu \) are shown in Table 2.

Table 1 Monte Carlo estimates from \(10^{5}\) simulations of a simple BDP where \(\lambda > \mu \). Growth rate \(\theta = \lambda - \mu = 0.0693\) for each row, i.e. the expected population size at time \(t = 10\) is set to be two times the initial population size \(n_{0}\). For each number of time points \(S\), the three rows correspond respectively to a standard deviation of 1.25, 1.5, and 2.0 times the initial population size \(n_{0}\)
Table 2 Monte Carlo estimates from \(10^{5}\) simulations of a simple BDP where \(\lambda < \mu \). Growth rate \(\theta = \lambda - \mu = -0.0693\) for each row, i.e. the expected population size at time \(t = 10\) is set to be half the initial population size \(n_{0}\). For each number of time points \(S\), the three rows correspond respectively to a standard deviation of 0.25, 0.5, and 1.0 times the initial population size \(n_{0}\)

Estimators are generally negatively biased but we also observe situations when \(\lambda < \mu \) in which the bias is positive. The magnitude of the bias of \(\hat{\lambda }\) and \(\hat{\mu }\) is very large when only one time point is used, for which we have \(|\text {Bias}(\hat{\lambda })| \approx \text {RMSE}(\hat{\lambda })\) and \(|\text {Bias}(\hat{\mu })| \approx \text {RMSE}(\hat{\mu })\). Increasing the number of time points \(S\) help reducing both the bias and RMSE of \(\hat{\lambda }\) and \(\hat{\mu }\). All estimators obviously perform worse when the standard deviation of the stochastic process is high. What is surprising to us is the observation that \(\hat{\theta }\) has approximately the same performance regardless of the initial sample size. Increasing the number of time points has also the counter-intuitive effect of making the estimation worse.

6.1.2 Technical replicates

Following the results from the previous section we want to investigate the performance of the estimators when the stochastic process is observed more than once. As an example, this is a standard setting in dose-response drug screening experiments where cell counts are observed after a period of incubation and (usually) 3 to 5 technical replicates are produced under the same experimental conditions [23]. For a fair comparison we will use the same simulation parameters from the previous simulation experiment with the only difference of now having three technical replicates instead of one. Results of the simulations when \(\lambda > \mu \) are shown in Table 3 while results of the simulations when \(\lambda < \mu \) are shown in Table 4.

Table 3 Monte Carlo estimates from \(10^{5}\) simulations of three simple BDPs where \(\lambda > \mu \). Growth rate \(\theta = \lambda - \mu = 0.0693\) for each row, i.e. the expected population size at time \(t = 10\) is set to be two times the initial population size \(n_{0}\). For each number of time points \(S\), the three rows correspond respectively to a standard deviation of 1.25, 1.5, and 2.0 times the initial population size \(n_{0}\)
Table 4 Monte Carlo estimates from \(10^{5}\) simulations of three simple BDPs where \(\lambda < \mu \). Growth rate \(\theta = \lambda - \mu = -0.0693\) for each row, i.e. the expected population size at time \(t = 10\) is set to be half the initial population size \(n_{0}\). For each number of time points \(S\), the three rows correspond respectively to a standard deviation of 0.25, 0.5, and 1.0 times the initial population size \(n_{0}\)

As expected, we see a decrease in both bias magnitude and RMSE for \(\hat{\lambda }\) and \(\hat{\mu }\). A small improvement is obtained also for \(\hat{\theta }\). Again, increasing the number of time points allow for a better estimation of \(\lambda \) and \(\mu \) but make the estimation of \(\theta \) worse. When increasing the number of time points \(S\), the loss (gain) of performance is lower (higher) that in the single observation case of the previous section.

6.1.3 Real data

As an example application we will use real data from a cancer drug combination experiment originally performed and analysed by Liu et al. [20]. Briefly, two monoclonal antibodies were combined together at a concentration ratio of 1:1 to form a mixture. Tested concentrations of the mixture were 0 (no antibody), 0.025, 0.25, 2.5, and 10 \(\upmu \) g/ml. Living cell counts were subsequently measured with a fluorescence microscopy at 1, 2, and 3 days. For each time point they performed six technical replicates for concentrations greater than zero and twelve replicates for the control dose of zero. Since the initial number of cells was not available, they estimated it from the data to be on average approximately equal to 23. Following previous studies [6] we will fix for each and every observation an initial cell count of 23 as if it were known in advance. The complete dataset is visually represented in Fig. 5.

Fig. 5
figure 5

Antibody dataset by Liu et al. [20]. All the observed counts are assumed to be originated from the same number of cells \(n_{0} = 23\). Increasing the antibody concentration reduces the growth rate of cancer cells

It is important to note that the dataset is made of 108 independent observations, i.e. counts referring to the same concentration at different time points are not part of the same time series but are, instead, independent realizations of the same stochastic process observed at different times. In our notation, \(S = 1\) and \(\tau = t / S = t\) for each of the 108 measurements. The basic datum \(\varvec{x}_{i}\), \(i = 1, \ldots , 108\), is a vector \((c_{i}, t_{i}, n_{i}(0), n_{i}(t_{i}))^{T}\) where \(c_{i}\) is the tested antibody concentration, \(t_{i}\) is the time in days, \(n_{i}(0)\) is the initial population size set to 23 for each and every observation, and \(n_{i}(t_{i})\) is the final cancer cell counts for observation \(i\). For further details about the study and the experimental design we refer to the original article of Liu et al. [20]. To model the data we use a similar approach to that of Crawford et al. [6], that is a linear model on the logarithm scale of the basic process rates. Formally we define

$$\begin{aligned} \left\{ \begin{aligned} \log (\lambda _{i})&= \alpha _{\lambda } + \beta _{\lambda } \log (1 + c_{i})\\ \log (\mu _{i})&= \alpha _{\mu } + \beta _{\mu } \log (1 + c_{i}) \end{aligned} \right. ,\; \text { for all } i = 1, \ldots , 108 \end{aligned}$$
(6.1)

Maximum likelihood estimates and their corresponding standard errors are shown in Table 5. We obtained estimates by numerically maximizing the log-likelihood function with the BFGS algorithm [4, Chapter 4]. We applied the delta method to the observed Fisher information matrix in order to compute the standard error of all parameters.

Table 5 Maximum likelihood estimates of model (6.1) based on the antibody dataset

According to our model, increasing the antibody concentration has the double effect of reducing the birth rate and raising the death rate while maintaining the overall rate \(\lambda + \mu \) approximately the same. When the dose of the treatment increases the global growth rate \(\theta \) decreases as a consequence, reaching a negative value at the maximum tested concentration. Interestingly, Crawford et al. obtained values that are slightly different from ours but still very close. In particular, the maximum absolute difference between our estimates of \(\theta \) and theirs is just 0.054. Since their R package birth.death is not available for download any more we could not replicate the analysis and investigate the discrepancies more. We believe, however, that the observed differences are simply due to numerical errors or to a chosen solution that is a local optimum rather than a global.

7 Concluding remarks

Maximum likelihood estimators for the basic rates of a simple (linear) birth-and-process are available in closed form only when the process is observed continuously over time. Numerical methods are currently the only option to draw inferences for discretely observed processes. However, we showed that direct application of the well-known transition probability might be subject to large numerical error. We rewrote the probability in terms of a Gaussian hypergeometric function and found a three-term recurrence relation for its evaluation. Not only our approach led to very accurate approximations but also to a computational efficient algorithm when compared to the naïve direct summation method.

By means of simulation we observed that MLEs \(\hat{\lambda }\) and \(\hat{\mu }\) are largely negatively biased. We confirmed the intuition that to obtain better estimates it is important to employ a large initial population size, multiple time points, and multiple technical replicates. The actual values, as one would expect, depend on the magnitude of the basic rates, i.e. the process standard deviation. If only the growth parameter \(\theta = \lambda - \mu \) is of interest then multiple technical replicates with (surprisingly) only one time point provide the best results. Interestingly, the initial population size seems not to affect the bias nor the root mean square error of \(\hat{\theta }\).

We also released a free Julia package called “SimpleBirthDeathProcess”. With the help of our tool it is possible to simulate, fit, or just evaluate the likelihood function of a simple BDP. Accurate evaluation of the log-likelihood function will create opportunities for future research, such as implementation of MCMC algorithms for Bayesian inference. As a final note, it might be worth investigating our conjecture that Lemma 4.2 can be extended to TTRRs with a positive coefficient.