1 Introduction

Reactions are inherently discrete and stochastic in nature [12, 15, 16]. Ignoring stochastic fluctuations and the integer nature of molecule counts can result in inappropriate models which especially fail for small reaction compartments as encountered in cell biology. Traditionally, the stochastic approach explains reaction dynamics using a discrete-state continuous-time Markov chains and describes the state of the system by the integer-valued number of molecules of each involved species. In this approach, the state vector of the system satisfies the random time change model (RTCM), which defines the reaction counting processes using time-warped Poisson processes [2]. Also, the probability mass function of these systems satisfies a set of differential equations referred to as the chemical master equation (CME) in the literature [18, 20]. When the number of molecules of the species in the system of interest is very high, the state vector of the system can be defined by real-valued concentrations instead of integer-valued particle numbers. Dynamics of such systems can approximately be capture through diffusion processes, and the state vector of the system satisfies an Itô stochastic differential equation (SDE) known as the chemical Langevin equation (CLE). Similarly, probability density function of these systems satisfies the Fokker–Planck equation (FPE) [18, 21]. In the thermodynamic limit, in which the number of molecules of species and the system volume both approach to infinity while the concentrations of species stay constant, the state of the system is given by the reaction rate equation (RRE) of the traditional deterministic modeling approach.

A major challenge of modeling the reaction networks using the CME is the curse of dimensionality. Each species of the system adds one dimension to the corresponding CME. Therefore, when the number of reachable states is high, it is infeasible to obtain the numerical solution of the CME. To avoid this drawback, different stochastic simulation algorithms (SSAs), such as the Doob–Gillespie algorithm and its variants, are proposed to obtain sample paths of the biochemical reaction system of interest [10, 19, 22]. The computational cost of these algorithms increases with the size or reactivity of the system and slow convergence of Monte Carlo estimates usually results in massive amounts of necessary simulation runs.

Moment approximations that analyze the dynamics of the reaction network under consideration using moments of the probability distribution satisfying the corresponding CME are often considered as an alternative [39]. In [13], the author propose the method of moments that computes the moments for any reaction network from the corresponding CME. In [31], a moment closure approximation that contains a finite set of ordinary differential equations (ODE) for the mean and the central moments by truncating the moment hierarchy at a certain order and using the Taylor series is introduced. Another moment closure method that approximates the moments with higher order, compared to the order of truncation, utilizing nonlinear closure functions of the lower order moments is introduced in [38]. A variational formulation is used in [6] to identify principled closure schemes and to relate moment closure to other approximation techniques.

Cellular reaction systems involve reactions with very different rates and species with very different abundances. Models only based on the traditional deterministic modeling approach fail to account for this nature. Therefore, different hybrid methods that couple the stochastic and deterministic modeling approaches were investigated. In general, hybrid methods separate reactions and/or species into different groups of reactions and/or species, and they use the diffusion or the deterministic modeling approach to describe the dynamics of fast reactions and/or species with high copy numbers, while a discrete Markov chain representation is utilized for slow reactions and/or species with low copy numbers. For example, in [8], the authors propose different hybrid models in four steps. In the first step, they partition the reactions/species into different classes, in the second step, they apply cycle averaging in reaction network, in the third step, they find super reactions which are fast enough to change the amount of continuous variables or have large stoichiometric vectors and finally, they propose their hybrid model based on Kramer’s Moyal expansion. In [32], the authors express the probability density function (PDF) of a hybrid representation as a product of the probability distribution function of the discrete species and the conditional probability density function (CPDF) of the continuous species given discrete species. Then, they scale the population levels of the continuous species and use the Wentzel–Kramers–Brillouin approximation to approximate the CPDF. Finally, they derive a hybrid system involving the evolution equations for the PDF of the discrete species and for the expectation of the CPDF of the continuous species conditioned on the discrete species. In [37], another hybrid stochastic method which separates reactions into fast, slow subgroups is proposed. It uses the CLE to describe the dynamics of the fast reaction while the next reaction variant of the stochastic simulation algorithm [19] is applied for the slow ones. Different algorithms which take into account the fact that one or multiple slow reaction can fire within a time step of numerical integration of the corresponding Langevin equation were introduced. In [26], the authors proposed a hybrid method which splits the state vector into two class of stochastic variables. In this method, it is assumed that stochastic variables in the first class are normally distributed with a small variance while stochastic variables in the second class are Markovian. Then, Monte Carlo and Quasi Monte Carlo methods are used to approximate the expected values of the stochastic variables in the first class while SSA is used to obtain marginal probability of the variables in the second class. Different types of hybrid models can be seen in [7, 11, 14, 27].

In [25], the authors introduced the method of conditional moments (MCM) that can be considered as the combination of a hybrid method and a moment approximation method. The MCM separates species into two different classes involving species with high copy number of molecules and species with low copy number of molecules. Based on this decomposition, the joint probability density function satisfying the corresponding CME is also represented as a product of the marginal probabilities of species with low copy number of molecules and the conditional probabilities of species with high copy number of molecules conditioned on the remaining species with low copy numbers of molecules. To describe the dynamics of species with low copy number of molecules, the authors used marginal probabilities, while the conditional means and the centered conditional moments are used to model the dynamics of species with high copy numbers. In comparison to [25], in [3], the authors obtained moments of the system of interest directly from the corresponding CME without using any partitioning of the species, and the maximum entropy approach is used to construct the corresponding probability distribution.

In [17], we developed a jump-diffusion approximation to model multi-scale behavior of cellular reactions. Based on an error bound, we separated reactions into fast and slow groups. We employed diffusion approximation for the fast reactions, while Markov jump process was kept for the slow ones. As a result, the state of the system was defined as the summation of the RTCM and the solution of the corresponding CLE. In this paper, based on this representation, we present the hybrid master equation (HME), which is the evolution equation for the joint probability density function of the jump diffusion approximation over the reaction counting process. The main contribution of the paper is to obtain evolution equation for the probability density function over the reaction counting process. Different than other studies [8, 25, 26, 32, 37], we define the probability density functions in terms of reaction counters and prove that evolution equation for this PDF which is called HME is the summation of the corresponding CME and the corresponding FPE [34]. Although it is possible to use different numerical methods to approximate the solution of the HME [35], we use the strategy proposed in [25].To solve the HME, we obtain the evolution equation for the marginal probability of slow reactions and the evolution equations for the conditional moments of the fast reactions given slow reactions [25]. Differently than [32], we don’t use any scaling parameter for the probability density functions and don’t approximate the conditional probability density function by using any series expansion. Also, we use not only expected values [26] of the reaction counters of the fast reactions given the reaction counters of slow reactions but also centered conditional moments of them. Using the maximum entropy approach, we construct the corresponding conditional probability at the time point of interest, which in turn gives the approximate solution of the corresponding HME. We note that the proposed algorithm approximates the joint probability density function over reaction counting process at a desired time point without counting firings of slow and fast reactions [37].

The rest of the paper is organized as follows: In Sect. 2, we describe the basic concepts of the stochastic modeling approach. In Sect. 3, we give a brief summary of the jump-diffusion approximation. We introduce the HME in Sect. 4. In Sect. 5, we construct an ODE system that will be used to obtain the approximate solution of the HME. In Sect. 6, we introduce the maximum entropy approach. In Sect. 7, we present numerical results and also explain the details of how we use the maximum entropy approach to construct the joint probability density function describing the HME. Section 8 concludes the paper.

Notation Before we give the details of the mathematical derivations, we present the basic notations used through the present paper. We represent all random variables and their realizations by upper-case (i.e. \(A\)) and lower case (i.e. \(a\)) symbols, respectively, and we use bold symbols to represent the support of a random variable (i.e. \( {\mathbf {A}}\)). Also, \(e_j\), \({\bar{e}}_j\) denote \((R-L) \times 1\), \(L \times 1\), unit vectors with 1 in the j-th component and \(0\) in other coordinates.

2 Stochastic modeling of chemical kinetics

In this study, we consider a well-mixed reaction system of \(M\) species, \(S_1,S_2,\ldots , S_M\), interacting through \(R \ge 1\) reaction channels \(R_1,R_2,\ldots ,R_R\) inside the reaction compartment with volume \(V\). The k-th reaction channel of the system is described as follows:

$$\begin{aligned} r_{1k}S_1+r_{2k}S_2+r_{3k}S_{3}+\cdots +r_{Mk}S_{M} {\mathop {\longrightarrow }\limits ^{ \displaystyle \ell _{k}}} p_{1k}S_1+p_{2k}S_2+p_{3k}S_{3}+\cdots +p_{Mk}S_{M}, \end{aligned}$$

where \(r_{jk}, p_{jk} \in {\mathbb {N}}\), \(j=1,2,\ldots ,M\), represent the number of molecules of species \(S_j\) consumed and produced with a single occurrence of the reaction \(R_k\), respectively, and \(\ell _k\) is the real-valued stochastic reaction rate constant. Let \(X_i(t) \in {\mathbb {N}}_{0}\) denote the number of molecules of species \(S_i\), \(i=1,2,\ldots ,M\), at time \(t \ge 0\). Then, the state of the system at time \(t\) is \(X(t)=(X_1(t),X_2(t),\ldots , X_M(t))^{T} \in {\mathbb {N}}_{0}^{M} \).

The classical stochastic modeling of biochemical networks assumes that the process of \(X\) is a continuous time Markov chain (CTMC). In this approach, the state vector, \(X(t)\), is defined as a random variable of the Markov jump process. Each reaction channel \(R_k\) , \(k=1,2,\ldots ,R,\) is specified by its stoichiometric vector (state-change vector) and its propensity function. The stoichiometric vector \(\nu _k=(\nu _{1k},\nu _{2k},\ldots ,\nu _{Mk}) \in {\mathbb {Z}}^{M}\) with \(\nu _{jk}= p_{jk}-r_{jk}\), \(j=1,2,\ldots ,M\), represents the change in the state of the system after one occurrence of the reaction \(R_k\). In other words, when the reaction \(R_k\) fires, the system state \(X(t)=x\) jumps to a new state \(x+\nu _k\). Given \(X(t)=x\), the probability that one \(R_k\) reaction takes place in the time interval \([t,t+h)\) is \(a_k(x)h+o(h)\) where \(a_k(x): {\mathbb {N}}_{0}^{M} \rightarrow {\mathbb {R}}_{+}\) represents the propensity function calculated by the law of mass action kinetics, i.e.,

$$\begin{aligned} a_k(x)=\displaystyle \ell _k\prod _{i=1}^{M} {x_i \atopwithdelims ()r_{ik}}. \end{aligned}$$

Let \(Z_k(t)\) denote the number of occurrence of the reaction \(R_k\) by the time \(t\), then the state of the system at time \(t\) can be obtained as follows:

$$\begin{aligned} X(t)=X(0)+\displaystyle \sum _{k=1}^{R}Z_k(t) \nu _k. \end{aligned}$$

If we represent the counting process \(Z_k(t)\) in terms of the independent Poisson process denoted by \(\xi _k\), such that \(Z_k(t)=\xi _k\Big ( \displaystyle \int _{0}^{t} a_k(X(s))ds\Big ) \), then the state vector of the above CTMC satisfies the following RTCM [2]

$$\begin{aligned} X(t)=X(0)+\displaystyle \sum _{k=1}^{R} \xi _{k}\left( \displaystyle \int _{0}^{t} a_{k}(X(s)) ds \right) \nu _{k}. \end{aligned}$$
(2.1)

Let us define the following probability mass function

$$\begin{aligned} p_t(x) = \mathrm {P} (X(t)=x). \end{aligned}$$

Another way of analyzing this CTMC process is to consider the time evolution of the probability function \(p_t(x)\). This probability mass function is the solution of the following Kolmogorov’s forward equation, which is known as the CME [20]

$$\begin{aligned} \displaystyle \frac{\partial p_t (x) }{\partial t}=\displaystyle \sum _{k=1}^{R} [a_k(x-\nu _{k}) p_t(x-\nu _{k})- a_{k}(x) p_t(x)]. \end{aligned}$$
(2.2)

When the number of molecules in the system is very high, then the abundance of the species at time \(t\) can be represented by the real valued concentrations of the form \(U(t)=V^{-1} X(t) \in {\mathbb {R}}^{M}_{\ge 0}\). In most cases, reaction channels in biochemical systems are bimolecular or monomolecular. If the k-th reaction channel \(R_k\) is bimolecular or monomolecular, then its propensity function satisfies the equality \(a_{k}(x)= V {\tilde{a}}_k(u)\) where \({\tilde{a}}_k\) is the propensity function obtained involving the deterministic reaction rate constant \({\tilde{\ell }}_k\).

It is well known that the centered version of each independent Poisson process, \(\xi _k\), in Eq. (2.1) can be approximated through the independent Brownian motions \(W_k(t)\) [2, 30]. Considering the fact that \((\xi _{k}(V t)-V t)/ \sqrt{V} \) converges in distribution to the Brownian motion \(W_k(t)\) for large \(V\), we obtain the diffusion approximation of Eq. (2.1) as given

$$\begin{aligned} U(t)=U(0)+\displaystyle \sum _{k=1}^{R} \nu _{k} \displaystyle \int _{0}^{t} {\tilde{a}}_{k}(U(s)) ds+ \frac{1}{\sqrt{V}} \displaystyle \sum _{k=1}^{R} \nu _{k} W_{k} \left( \displaystyle \int _{0}^{t} {\tilde{a}}_{k}(U(s)) ds\right) .\qquad \end{aligned}$$
(2.3)

The first and the second summand in the right hand-side of Eq. (2.3) are called drift and diffusion terms, respectively. The time derivative of the state vector \(U(t)\) satisfies an SDE, namely the CLE.

Let define the following probability density function

$$\begin{aligned} q_t(u) du = \mathrm {P}( U(t) \in [u,u+du]). \end{aligned}$$

Then, analog of the CME for this continuous process is represented by the following FPE [21]

$$\begin{aligned} \frac{\partial q_t(u)}{\partial t}= & {} -\displaystyle \sum _{i=1}^{M} \frac{\partial }{\partial u_{i}}\left[ \left( \displaystyle \sum _{k=1}^{R} \nu _{ik}{\tilde{a}}_{k}(u)\right) q_t(u)\right] \\&+\frac{1}{2} \displaystyle \sum _{i,i'=1}^{M} \frac{\partial ^2}{\partial u_{i} \partial u_{i'}} \left[ \left( \displaystyle \sum _{k=1}^{R} \nu _{ik}\nu _{i'k} {\tilde{a}}_k(u)\right) q_t(u)\right] . \end{aligned}$$

Cellular processes consist of bimolecular reactions of very different speeds involving reactants of largely different abundances. Therefore, the models based only on the RTCM or only the diffusion approximation may be inappropriate to explain the dynamics of such multi-scale processes. In [17], we developed a jump-diffusion approximation to model such processes. In the following section, we will give a summary of this approximation.

3 Jump diffusion approximation

In jump-diffusion approximation [17], we partition the reactions into the fast subgroup, \({\mathscr {C}}\), and the slow subgroup, \({\mathscr {D}}\), and model the fast group using a diffusion process, while Markov chain representation is kept for the slow group.

In this approach instead of the CTMC process represented by \(X\), we focus on the scaled abundances \({\bar{X}}^N_i = X_i/N^{\zeta _i}\), \(i=1,2,\ldots ,M\), and the scaled stochastic reaction rates \(\kappa _j=\ell _j/N^{\eta _j}\), \(j=1,2,\ldots , R\), such that \({\bar{X}}^N_i = O\left( 1\right) \), \(\kappa _j = O\left( 1\right) \). Naturally, these scaled quantities will produce new scaled propensity functions as follows

$$\begin{aligned} a_k\left( X(t)\right) = N^{\eta _{k}+r_{k}\cdot \zeta }{\bar{a}}_{k}({\bar{X}}^N (t)), \end{aligned}$$

where \(r_k=(r_{1k},r_{2k},\ldots ,r_{Mk} )\) and \(\zeta =(\zeta _1,\zeta _2,\ldots ,\zeta _M)\). We note that \({\bar{a}}_{k}(.)\) functions are also \(O(1)\). Finally, scaling the time \(t \rightarrow t N^{\theta }\) and defining \(X^{N}(t)={\bar{X}}^N(tN^{\theta })\), we transform the state vector, \(X(t)\), given by Eq. (2.1) into the following scaled state vector

$$\begin{aligned} X^N\left( t\right)&= X^N\left( 0\right) + \sum _{k=1}^{R} \xi _k \left( \displaystyle \int _0^t \alpha _k(X^N(s)) ds \right) \ \mu _k, \end{aligned}$$
(3.1)

where \( \alpha _k(X^{N}(t))= N^{\rho _k} {\bar{a}}_{k} (X^{N}(t))\), \(\rho _k=\theta +\eta _k+r_k \cdot \zeta \) and \(\mu _{ik}=\nu _{ik}/N^{\zeta _i}\), \(i=1,2,\ldots , M\).

Modeling the fast reactions through diffusion approximation and modeling the slow reactions through Markov chains give the state vector of the jump diffusion approximation as follows:

$$\begin{aligned} Y(t)= & {} Y(0)+ \displaystyle \sum _{i \in {\mathscr {D}}} \xi _{i}\left( \displaystyle \int _{0}^{t} \alpha _{i} (Y (s)) ds\right) \mu _{i} +\displaystyle \sum _{j \in {\mathscr {C}}} \displaystyle \int _{0}^{t} \alpha _{j}(Y(s)) ds \, \mu _{j}\nonumber \\&+\displaystyle \sum _{j \in {\mathscr {C}}} W_{j} \left( \displaystyle \int _{0}^{t} \alpha _{j} (Y(s)) ds\right) \mu _{j}, \end{aligned}$$
(3.2)

where \(Y(0)=X^{N}(0)\), and \(W_j\) is a standard Brownian motion. If \(\tau _1\), \(\tau _2\) denote the successive firing times of reactions from the slow group, then for \(\tau _1< t <\tau _2\), only reactions from the fast group can fire. Therefore, in this time interval, the state vector of the system is given by

$$\begin{aligned} Y(t)=Y(\tau _1)+\displaystyle \sum _{j \in {\mathscr {C}}} \displaystyle \int _{\tau _1}^{t} \alpha _{j}(Y(s)) ds\, \mu _{j} +\displaystyle \sum _{j \in {\mathscr {C}}} W_{j} \left( \displaystyle \int _{\tau _1}^{t} \alpha _{j} (Y(s)) ds\right) \mu _{j}. \end{aligned}$$
(3.3)

The main contribution of this study is the derivation of an error bound for the mean \(e(t)= {\mathsf {E}}\mid X^{N}(t)-Y(t) \mid \), which is used to partition the reaction set into fast and slow subgroups. Based on this error bound, we construct a dynamic partitioning algorithm that takes into account the fact that a fast reaction can return to a slow reaction or vice versa during the course of time.

By describing the state vector of the system as the summation of purely discrete and purely continuous components, we can introduce the HME, which defines the joint probability density function of the jump diffusion approximation over the reaction counting process. In the following section, we will obtain the HME.

4 Hybrid master equation

In jump diffusion approximation, we partition the reaction set into two subsets. As mentioned before, the first subset \({\mathscr {C}}\) involves reactions modeled by diffusion approximation, while the rest of the reactions constituting the slow set \({\mathscr {D}}\) are modeled by Markov chains. In the rest of the study, we will consider that there are \(L\) slow reactions, i.e., \(\mid {\mathscr {D}}\mid =L\), and \(R-L\) fast reactions in the system, i.e., \(\mid {\mathscr {C}}\mid =R-L\).

Let \(Z(t)=(Z_1(t),Z_2(t),\ldots ,Z_R(t))^{T}\) be a vector of reaction counters such that \(Z_{i}(t)\) denotes the number of occurrences of the reaction \(R_i\), \(i=1,2,\ldots ,R\), during the time of the process until time \(t>0\). Similar to the idea of splitting the state vector of the system into purely discrete and purely continuous parts, we also separate \(Z(t)=(D(t),C(t))^{T}\) into purely discrete and continuous parts corresponding to the reaction counters of the slow, \(D(t) \in {\mathbb {N}}^{L}\), and the fast reaction set, \(C(t) \in {\mathbb {R}}^{R-L}_{\ge 0} \) such that \(D_i(t)=Z_i(t),i \in {\mathscr {D}},\) and \(C_j(t)=Z_j(t),j \in {\mathscr {C}}\). We also separate the stoichiometric vectors such that \(\mu _{i}^{D}=\mu _i,\, i \in {\mathscr {D}},\) and \(\mu _{j}^{C}=\mu _{j},\, j \in {\mathscr {C}} \).

By using Eq. (3.2), we will define reaction counters as follows:

$$\begin{aligned} D_i(t)= & {} \xi _{i} \left( \displaystyle \int _{0}^{t} \alpha _i(Y(s)) ds\right) =\xi _{i} \left( \displaystyle \int _{0}^{t} {\tilde{\alpha }}_i(D(s),C(s)) ds\right) , \quad \quad i \in {\mathscr {D}} ,\\ C_j(t)= & {} \displaystyle \int _{0}^{t} \alpha _{j}(Y(s))ds+W_j\left( \int _{0}^{t} \alpha _{j}(Y(s))ds \right) \\= & {} \displaystyle \int _{0}^{t} {\tilde{\alpha }}_{j}(D(s), C(s))ds+W_j \left( \int _{0}^{t} {\tilde{\alpha }}_{j}(D(s),C(s))ds\right) , \quad \quad j \in {\mathscr {C}}, \end{aligned}$$

where

$$\begin{aligned} \alpha _k(y)=\alpha _k\left( y(0)+\displaystyle \sum _{i \in {\mathscr {D}}} d_{i} \mu _{i}^{D}+\displaystyle \sum _{j \in {\mathscr {C}}} c_j \mu _j^{C}\right) ={\tilde{\alpha }}_k(d,c), \, k=1,2,\ldots , R.\quad \end{aligned}$$
(4.1)

We note that if \(\tau _1,\, \tau _2\) denote the successive firing times of reactions from the slow group, then for \(\tau _1<t<\tau _2\), \(C(t)\) satisfies the following equation

$$\begin{aligned} C(t)=C(\tau _1)+\displaystyle \sum _{j \in {\mathscr {C}}} \left( \displaystyle \int _{\tau _1}^{t} {\tilde{\alpha }}_{j}(d,C(s)) ds\right) e_j+\displaystyle \sum _{j \in {\mathscr {C}}} W_j \left( \displaystyle \int _{\tau _1}^{t} {\tilde{\alpha }}_{j}(d,C(s)) ds\right) e_j, \end{aligned}$$
(4.2)

where \(d \) denotes the number of slow reactions fired until time \(\tau _1 > 0\).

The HME is the time derivative of the joint probability density function \(p_t: {\mathbb {N}}^{L} \times {\mathbb {R}}^{R-L}_{ \ge 0} \rightarrow {\mathbb {R}}_{\ge 0} \)

$$\begin{aligned} p_t(d,c)dc= \mathrm {P}(D(t)=d, C(t)\in [c,c+dc] ). \end{aligned}$$
(4.3)

Then, we can write

$$\begin{aligned} p_t(d,c)=p_t(c\mid d) p_t(d), \end{aligned}$$

where

$$\begin{aligned} p_t(c \mid d)\, dc= & {} \mathrm {P}(C(t) \in [c,c+dc] \mid D(t)=d)\\ p_t(d)= & {} \mathrm {P}(D(t)=d). \end{aligned}$$

To obtain the evolution equation for \(p_t(d,c)\), which is called the HME, we need the following result whose details can be found in [34].

Result Let \(D(t) \in {\mathbf {D}} \subset {\mathbb {N}}^{L}\) be a discrete process and \(C(t) \in {\mathbb {R}}^{R-L}_{ \ge 0}\) be a continuous process. Define the joint probability density function as follows:

$$\begin{aligned} p_t(d,c) =p_t(c\mid d) p_t(d). \end{aligned}$$

Then, the time derivative of this joint probability function, which is referred to as generalized Fokker–Planck equation (GFPE), has the following form

$$\begin{aligned} \frac{\partial }{\partial t }p_t(d,c)= & {} \displaystyle \sum _{d' \in {\mathbf {D}}} a_{dd'}p_{t}(d',c) +\displaystyle \sum _{n_1,n_2,\ldots ,n_{R-L}=1}^{ \infty } \left( \displaystyle \prod _{i=1}^{R-L} \frac{(-1)^{n_{i}} \frac{\partial ^{n_{i}}}{\partial c_{i}^{n_{i}} }}{n_{i}!}\right) [A_{n_1,n_2,\ldots ,n_{R-L}}p_t(d,c)],\nonumber \\ \end{aligned}$$
(4.4)

where

$$\begin{aligned} A_{n_1,n_2,\ldots ,n_{R-L}} =\lim _{h \rightarrow 0} \frac{1}{h} {\mathsf {E}}\left[ \prod _{i=1}^{R-L} \{C_{i}(t+h)-C_{i}(t)\}^{n_{i}} \mid D(t)=d,C(t),D(t+h)=d\right] , \end{aligned}$$

and

$$\begin{aligned} a_{dd'}=\lim _{ h \rightarrow 0} \frac{1}{h}[\mathrm {P}(D(t+h)=d\mid D(t)=d', C(t))- \delta _{d d'}]. \end{aligned}$$
(4.5)

where \(({\mathsf {E}}[C(t) \mid d]= (\displaystyle { \displaystyle \int _{ \mathbb {R}_{ \ge 0}^{R-L}}} c \, p_t(c \mid d) \, dc)\). It is also proved that \((A_{n_{1},n_{2},\ldots ,n_{R-L}}=0)\) for all \((\displaystyle \sum _{i=1}^{R-L} n_{i} \ge 3)\). This gives us

$$\begin{aligned}&\displaystyle \sum _{n_1,n_2,\ldots ,n_{R-L}=1}^{\infty } \left( \displaystyle \prod _{i=1}^{R-L} \frac{(-1)^{n_{i}} \frac{\partial ^{n_{i}}}{\partial c_{i}^{n_{i}}}}{n_{i}!} \right) [A_{n_1,n_2,\ldots ,n_{R-L}}p_t(d,c)]=- \displaystyle \sum _{j=1}^{R-L} \frac{\partial }{\partial c_{j}} [B_{j} p_t(d,c)] \nonumber \\&\quad +\, \frac{1}{2} \displaystyle \sum _{i,j=1}^{R-L} \frac{\partial ^{2}}{\partial c_{i} \partial c_{j}} [B_{ij} p_t(d,c)], \end{aligned}$$
(4.6)

where

$$\begin{aligned} B_{j}= \lim _{h \rightarrow 0} \displaystyle \frac{1}{h} {\mathsf {E}}[(C_{j}(t+h)- C_{j}(t))\mid D(t)=d, C(t), D(t+dt )=d], \end{aligned}$$

and

$$\begin{aligned} B_{ij}= \lim _{h \rightarrow 0} \displaystyle \frac{1}{h} {\mathsf {E}}[\{C_{i}(t+h)- C_{i}(t)\} \{C_{j}(t+h) - C_{j}(t)\} \mid D(t)=d, C(t), D(t+dt )=d]. \end{aligned}$$

Theorem 4.1

Let \(Z(t)=\{ D(t),C(t)\}\) be a joint reaction counting process where \(D(t)\) is a discrete random process with states \(d \in {\mathbf {D}} \subset {\mathbb {N}}^{L}, \, L >0,\) and \(C(t)\) is a continuous random process with states \(c \in {\mathbb {R}}^{R-L}_{\ge 0}\), \(R-L>0\). Define \(Y\) as a multi-scale process whose state vector is given in Eq. (3.2). Then, the joint counting probability density function given in Eq. (4.3) satisfies the following GFPE, which is referred to as the HME in the present paper.

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_{t}(d,c)= & {} \displaystyle \sum _{i \in {\mathscr {D}}} \Big ({\tilde{\alpha }}_{i}(d-{\bar{e}}_i,c) p_{t}(d-{\bar{e}}_i,c)-{\tilde{\alpha }}_{i}(d,c)p_t(d,c)\Big ) \nonumber \\&-\displaystyle \sum _{j \in {\mathscr {C}}}\displaystyle \frac{\partial }{\partial c_j} ( {\tilde{\alpha }}_{j}(d,c) p_t(d,c)) \nonumber \\&+\frac{1}{2} \displaystyle \sum _{j \in {\mathscr {C}}} \frac{\partial ^{2}}{\partial c_{j}^{2}} ( {\tilde{\alpha }}_{j}(d,c)p_t(d,c)). \end{aligned}$$
(4.7)

Proof

By using Eqs. (4.4) and (4.7), we obtain

$$\begin{aligned} \frac{\partial }{\partial t} p_{t}(d,c)= & {} \displaystyle \sum _{d' \in {\mathbf {D}}} a_{dd'}p_t(d',c) - \displaystyle \sum _{j \in {\mathscr {C}}}\frac{\partial }{\partial c_{j}} (B_j p_{t}(d,c)) \nonumber \\&+\frac{1}{2} \displaystyle \sum _{i,j \in {\mathscr {C}}}\frac{\partial ^2}{ \partial c_i \partial c_j} (B_{ij} p_{t}(d,c)). \end{aligned}$$
(4.8)

Now, let’s focus on the first summand on the right hand-side of Eq. (4.8), which can be rewritten as follows:

$$\begin{aligned} \sum _{d' \in {\mathbf {D}}} a_{dd'} p_{t}(d',c)=\displaystyle \sum _{{\begin{array}{c} d' \in {\mathbf {D}}\\ d \ne d' \end{array}}}a_{dd'}p_{t}(d',c)+ a_{dd}p_{t}(d,c). \end{aligned}$$
(4.9)

Using Eq. (4.5) gives

$$\begin{aligned} a_{dd}= \lim _{h \rightarrow 0} \frac{1}{h} [\mathrm {P}(D(t+h)=d\mid D(t)=d,C(t))-1], \end{aligned}$$

which can be reformulated as follows

$$\begin{aligned} a_{dd}= \lim _{h \rightarrow 0} \frac{1}{h} \left[ -\displaystyle \sum _{{\begin{array}{c} d' \in {\mathbf {D}}\\ d \ne d' \end{array}}} \mathrm {P}(D(t+h)=d'\mid D(t)=d,C(t)) \right] . \end{aligned}$$

By using this representation, we can rewrite Eq. (4.9) in the following form

$$\begin{aligned} \displaystyle \sum _{d' \in {\mathbf {D}}} a_{dd'}p_{t}(d',c) = \displaystyle \sum _{{\begin{array}{c} d' \in {\mathbf {D}}\\ d \ne d' \end{array}}} \Big ( a_{dd'} p_{t}(d',c)- a_{d'd} p_{t}(d,c) \Big ) .\end{aligned}$$

In our multi-scale process, we have \(L\) slow reactions, and one firing of the reaction \(R_j\) in this set updates \(d\) to \(d+{\bar{e}}_j\). Starting from \(d\), the system can jump to \(d'=d+{\bar{e}}_j\), meaning that \(a_{d+{\bar{e}}_j,d}={\tilde{\alpha }}_j(d,c)\). In the same vein, to reach \(d\), the system must supervene on \(d-{\bar{e}}_j\), by definition \(a_{d,d-{\bar{e}}_j}= {\tilde{\alpha }}_{j}(d-{\bar{e}}_j,c).\) As a result, we obtain the desired summand as follows:

$$\begin{aligned} \sum _{d' \in {\mathbf {D}}} a_{dd'} p_t(d',c)=\displaystyle \sum _{i \in {\mathscr {D}}}\Big ({\tilde{\alpha }}_i(d-{\bar{e}}_i,c) p_t(d-{\bar{e}}_i,c)-{\tilde{\alpha }}_i(d,c)p_t(d,c)\Big ). \end{aligned}$$
(4.10)

Now, we can concentrate on the second and the third summands of Eq. (4.8). Jump diffusion approximation is based on the idea that between two successive firing times of the slow reactions, the fast reactions continue to fire. Hence, the state vector and also the reaction counting process of the fast reaction set will satisfy diffusion processes (see Eqs. 3.3, 4.2). Therefore, \(B_j\) and \(B_{ij}\) values have the forms [21, 29]

$$\begin{aligned} B_j= \displaystyle \sum _{k \in {\mathscr {C}}} e_{jk} {\tilde{\alpha }}_{k}(d,c), \quad B_{ij}=\displaystyle \sum _{k \in {\mathscr {C}}} e_{ik} e_{jk} {\tilde{\alpha }}_{k}(d,c). \end{aligned}$$

Substitution of \(B_j\) and \(B_{ij}\) values and Eq. (4.10) into Eq. (4.8) gives

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_{t}(d,c)= & {} \displaystyle \sum _{i \in {\mathscr {D}}} \Big ({\tilde{\alpha }}_{i}(d-{\bar{e}}_i,c) p_{t}(d-{\bar{e}}_i,c)-{\tilde{\alpha }}_{i}(d,c)p_t(d,c)\Big ) \\&-\displaystyle \sum _{j \in {\mathscr {C}}}\displaystyle \frac{\partial }{\partial c_j} \left( \displaystyle \sum _{k \in {\mathscr {C}}} e_{jk} {\tilde{\alpha }}_{k}(d,c) \right) \\&+ \frac{1}{2} \displaystyle \sum _{i,j \in {\mathscr {C}}}\frac{\partial ^2}{ \partial c_i \partial c_j} \left( \displaystyle \sum _{k \in {\mathscr {C}}} e_{ik} e_{jk} {\tilde{\alpha }}_{k}(d,c) p_{t}(d,c)\right) \\= & {} \displaystyle \sum _{i \in {\mathscr {D}}} \Big ({\tilde{\alpha }}_{i}(d-{\bar{e}}_i,c) p_{t}(d-{\bar{e}}_i,c)-{\tilde{\alpha }}_{i}(d,c)p_t(d,c)\Big ) \\&-\displaystyle \sum _{j \in {\mathscr {C}}}\displaystyle \frac{\partial }{\partial c_j} ({\tilde{\alpha }}_{j}(d,c) p_{t}(d,c) ) + \frac{1}{2} \displaystyle \sum _{j \in {\mathscr {C}}}\frac{\partial ^2}{ \partial c_j^2} ( {\tilde{\alpha }}_{j}(d,c) p_{t}(d,c)), \end{aligned}$$

which completes the proof. \(\square \)

Based on the properties of the joint counting probability density function, we can write

$$\begin{aligned} p_t(d,c)=p_t(c\mid d)p_t(d). \end{aligned}$$

Since we partition reaction counters into two subsets, we will also decompose the propensity functions. Using mass action kinetics to compute propensities is very popular, and for this large class we partition the propensity function of the reaction \(R_k\), \({\tilde{\alpha }}_k(d,c)\), \(k=1,2,\ldots ,R\), as follows:

$$\begin{aligned} {\tilde{\alpha }}_k(d,c)= & {} \alpha _{k}\left( y(0)+\displaystyle \sum _{i \in {\mathscr {D}}}d_{i}\mu _{i}^{D}+ \displaystyle \sum _{j \in {\mathscr {C}}}c_{j} \mu _{j}^{C}\right) \\= & {} \kappa _{k} \displaystyle \prod _{s=1}^{M}\left( y_s(0)+\displaystyle \sum _{i \in {\mathscr {D}}}d_i \mu _{si}^{D}+\displaystyle \sum _{j \in {\mathscr {C}}} c_j \mu _{sj}^{C}\right) ^{r_{sk}}\\= & {} \kappa _{k} \displaystyle \prod _{s=1}^{M}(\beta _s(d)+\gamma _s(c))^{r_{sk}} = \kappa _{k} \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sk} } \left( {\begin{array}{c}r_{sk}\\ n\end{array}}\right) \beta _{s}^{n}(d) \gamma _{s}^{r_{sk}-n}(c) \end{aligned}$$

where \(\beta _s(d)=y_s(0)+\displaystyle \sum _{i \in \mathscr {D}}d_{i} \mu _{si}^{D}\), \(\gamma _{s}(c)=\displaystyle \sum _{j \in \mathscr {C}} c_{j} \mu _{sj}^{C}\). With a slight abuse of notation but in order to avoid clutter, we will subsequently subsume the binomial coefficient into the definitions of \(\beta _s\) and \(\gamma _s\). Based on this representation, the HME given in Eq. (4.7) can be rewritten in the following form

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_{t}(d,c)= & {} \displaystyle \sum _{i \in {\mathscr {D}}}\displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \kappa _{i} \Big (\beta _{s}^{n}(d-{\bar{e}}_{i}) \gamma _{s}^{r_{si}-n}(c) p_{t}(d-{\bar{e}}_{i},c)\nonumber \\&- \beta _{s}^{n}(d) \gamma _{s}^{r_{si}-n}(c) p_{t}(d,c) \Big ) \nonumber \\&-\displaystyle \sum _{j \in {\mathscr {C}}} \displaystyle \frac{\partial }{\partial c_j} \left( \kappa _j \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _{s}^{n}(d) \gamma _{s}^{r_{sj}-n}(c) p_{t}(d,c) \right) \nonumber \\&+ \frac{1}{2} \displaystyle \sum _{ j \in {\mathscr {C}}} \frac{\partial ^{2}}{ \partial c_{j}^{2}} \left( \kappa _j \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}}\beta _{s}^{n}(d) \gamma _{s}^{r_{sj}-n}(c) p_t(d,c)\right) . \end{aligned}$$
(4.11)

Let \(f(d): {\mathbf {D}} \rightarrow {\mathbb {R}}\) and \(g(c):{\mathbb {R}}_{\ge 0}^{R-L} \rightarrow {\mathbb {R}} \) be any functions of \(d\) and \(c\) variables, respectively. To simplify the notation, we introduce one step operator in the following form

$$\begin{aligned} {\mathscr {F}}^{{\bar{e}}_i} \Big ( f(d) g(c)\Big )= f(d+ {\bar{e}}_i) g(c), i \in {\mathscr {D}}. \end{aligned}$$

Based on this representation, we define

$$\begin{aligned} \varGamma (\beta (d), \gamma (c))= & {} \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) \gamma _{s}^{r_{si}-n}(c) \right) \\&- \displaystyle \sum _{j \in {\mathscr {C}}} \frac{ \partial }{ \partial c_j} \left( \kappa _j \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _{s}^{n}(d) \gamma _{s}^{r_{sj}-n}(c) \right) \\&+\frac{1}{2} \displaystyle \sum _{j \in {\mathscr {C}}} \frac{ \partial ^2 }{ \partial c_j^2} \left( \kappa _j \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _{s}^{n}(d) \gamma _{s}^{r_{sj}-n}(c) \right) . \end{aligned}$$

Then, we can write Eq. (4.11) in the following form

$$\begin{aligned} \frac{ \partial }{ \partial t} p_t(d,c) = \varGamma (\beta (d), \gamma (c)) p_t(d,c). \end{aligned}$$
(4.12)

In the rest of the study, we will assume that \(p_t(d,c)\) is zero at \(c=0\), \(c=\infty \) [24, 36]. In the folllowing section, we will explain how we obtain the solution of this HME.

5 Solution of the hybrid master equation

To obtain the joint counting probability density function, \(p_t(d,c)\), described by the HME given in Eq. (4.12), we will approximate the process \(C(t) \mid D(t)\) using its moments. Solving a maximum entropy problem for each conditional moment will produce the conditional probability function, \(p_t(c \mid d)\). The multiplication of \(p_t(c\mid d)\) with the marginal probabilities of the remaining discrete states, i.e., \(p_t(d)= \int _{{\mathbb {R}}_{ \ge 0}^{R-L}} p_t(d,c) \, dc,\) will give us the desired joint probability density function \(p_t(d,c)\) .

In the rest of the study, time dependent conditional means and the centered conditional moments of the process \(C(t)\mid D(t)\) will be denoted by

$$\begin{aligned} {\mathsf {E}}_{t}[C_m\mid d]= & {} \displaystyle \int _{{\mathbb {R}}_{ \ge 0}^{R-L}} c_m p_{t}(c\mid d) \,dc, \quad m \in {\mathscr {C}},\\ {\mathsf {E}}_{t}[{\tilde{C}}^{M}\mid d]= & {} \displaystyle \int _{{\mathbb {R}}_{ \ge 0}^{R-L}} \displaystyle \prod _{j \in {\mathscr {C}}}{\tilde{c}}_{j}^{M_j} p_t(c \mid d)dc, \end{aligned}$$

where \({\tilde{c}}=c-{\mathsf {E}}_{t}[C\mid d]\), \(M=(M_1,M_2,\ldots ,M_{R-L})^{T} \in {\mathbb {N}}^{R-L}\). Now, based on the study [25], we want to construct a differential equation system to obtain \(p_{t}(d)\), \({\mathsf {E}}_t[ C_m\mid d]\), \({\mathsf {E}}_{t}[{\tilde{C}}^{M}\mid d ]\). To construct this system, we will need the following Lemma [13, 25].

Lemma 5.1

Let \(F_t: {\mathbb {R}}^{R-L}_{\ge 0} \longrightarrow {\mathbb {R}}\) be a polynomial function of \(c\), and \(p_{t}(d,c)\) satisfy differential equation (4.12). Assume that sufficiently many moments of \(p_{t}(d,c)\) with respect to \(c\) exist, and the joint counting probability density vanishes at \(c=0\) and \(c=\infty \). Define the following conditional mean

$$\begin{aligned} {\mathsf {E}}_t[F_t(C)\mid d ]=\displaystyle \int _{{\mathbb {R}}_{ \ge 0}^{R-L}} F_t(c) p_{t}(c\mid d) \,dc. \end{aligned}$$

Then,

$$\begin{aligned} \frac{\partial }{\partial t}({\mathsf {E}}_t[ F_t(C)\mid d]p_t(d))= {\tilde{\varGamma }}( \beta (d), F_t(c)\gamma (c)) p_t(d)+{\mathsf {E}}_t\left[ \frac{ \partial }{\partial t} F_t(C) \mid d\right] p_t(d), \end{aligned}$$

where

$$\begin{aligned} {\tilde{\varGamma }}(\beta (d), F_t(c)\gamma (c))= & {} \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_{t}[ F_t(C) \gamma _{s}^{r_{sj}-n}(C) \mid d] \right) \\&+ \displaystyle \sum _{j \in {\mathscr {C}}} \kappa _{j} \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _{s}^{n}(d) {\mathsf {E}}_{t}\big [ \gamma _{s}^{r_{sj}-n}(C) \frac{\partial }{\partial c_j} F_t(C) \mid d\big ]\\&+\frac{1}{2} \displaystyle \sum _{j \in {\mathscr {C}}} \kappa _{j} \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} {\mathsf {E}}_t\big [ \gamma _{s}^{r_{sj}-{n}}(C) \frac{\partial ^{2}}{\partial c_j^{2}} F_t(C) \mid d\big ]. \end{aligned}$$

Proof

The proof of the Lemma can be found in “Appendix  A.1”. \(\square \)

When \(F_t(c)=1\) in Lemma 5.1, we obtain the time derivative of the marginal probability \(p_t(d)\), which is given in the following proposition.

Proposition 5.1

$$\begin{aligned} \frac{\partial }{\partial t} p_{t}(d)= \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_t[\gamma _{s}^{r_{si}-n}(C)\mid d]p_t(d)\right) \end{aligned}$$
(5.1)

The strategy of our method is to obtain \(p_{t}(d)\) and \(p_{t}(c\mid d)\) separately and construct the joint probability function using the equality \(p_t(d,c)=p_t(c\mid d)p_{t}(d)\). To obtain the conditional probability \(p_t(c\mid d)\), we will use evolution equations of the conditional means \({\mathsf {E}}_t[C_m\mid d]\), \(m \in {\mathscr {C}}\), and the centered conditional moments \( {\mathsf {E}}_t[ {\tilde{C}}^{M}\mid d ]\), \(M \in {\mathbb {N}}^{R-L}\), which are the functions of \(p_t(d)\), \({\mathsf {E}}_t[C_m\mid d], \, {\mathsf {E}}_t[ {\tilde{C}}^{M}\mid d ]\). Equation (5.1) will be the first equation of our system. We note that the differential equation defining the marginal probability only depends on the slow reactions. To solve this differential equation, we need to reformulate the unknown conditional means \({\mathsf {E}}_t[ \gamma _s^{r_{si}-n}(C)\mid d]\) through the known \({\mathsf {E}}_t[C_m\mid d]\), \({\mathsf {E}}_t[ {\tilde{C}}^{M}\mid d ]\). The details of this transformation can be found in “Appendix  A.2”.

In the following proposition, we will obtain the time evolution equation for the conditional means \({\mathsf {E}}_t[C_m\mid d], \, m \in {\mathscr {C}}\).

Proposition 5.2

$$\begin{aligned} p_t(d) \frac{\partial }{\partial t } {\mathsf {E}}_t[C_m\mid d]= & {} \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_t[C_m \gamma _{s}^{r_{si}-n}(C)\mid d] p_t(d) \right) \nonumber \\&+ \displaystyle \sum _{j \in {\mathscr {C}}} \kappa _j \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _s^{n}(d) {\mathsf {E}}_t[\gamma _{s}^{r_{sj}-n}(C) \delta _{jm} \mid d]p_t(d) \nonumber \\&- \,{\mathsf {E}}_t[C_m\mid d] \frac{\partial }{\partial t } p_t(d) \end{aligned}$$
(5.2)

where \(\delta _{jm}\) is the Kronecker delta function.

Proof

The proof of the proposition can be found in “Appendix A.3”. \(\square \)

In the following proposition, we will obtain \(p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[{\tilde{C}}^{M}\mid d]\).

Proposition 5.3

$$\begin{aligned}&p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[{\tilde{C}}^{M}\mid d]\nonumber \\&\quad = \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_t[{\tilde{C}}^{M} \gamma _{s}^{r_{si}-n}(C)\mid d] p_t(d) \right) \nonumber \\&\qquad + \displaystyle \sum _{j \in {\mathscr {C}}}\kappa _j \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _{s}^{n}(d) {\mathsf {E}}_t[M_j\gamma _{s}^{r_{sj}-n}(C) {\tilde{C}}^{M-e_j}\mid d]p_t(d) \nonumber \\&\qquad + \frac{1}{2} \displaystyle \sum _{j \in {\mathscr {C}}} \kappa _j \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _s^{n}(d) {\mathsf {E}}_t[M_j (M_j-1) \gamma _{s}^{r_{sj}-n}(C) {\tilde{C}}^{M-2e_j} \mid d ] p_t(d) \nonumber \\&\qquad - \displaystyle \sum _{j \in {\mathscr {C}}}M_j {\mathsf {E}}_{t}[{\tilde{C}}^{M-e_j}\mid d] p_t(d) \frac{\partial }{\partial t }{\mathsf {E}}_t[C_j\mid d]- {\mathsf {E}}_t[{\tilde{C}}^{M}\mid d] \frac{\partial }{\partial t} p_t(d). \end{aligned}$$
(5.3)

Proof

The proof of the theorem can be found in “Appendix Sect. A.4”. \(\square \)

Up to this section, we have obtained the time derivatives of the marginal probabilities as well as those of the conditional means and the centered conditional moments. These three equations will give us the following differential equation system.

Theorem 5.4

Let \(p_t(d,c)=p_t(c\mid d) p_t(d)\) satisfy Eq. (4.7). Then, the time derivative of \( p_t(d)\), \({\mathsf {E}}_t [C_m\mid d]\), \(m \in {\mathscr {C}}\) and \({\mathsf {E}}_t[{\tilde{C}}^{M}\mid d] \), \(M=(M_1,M_2, \ldots , M_{R-L})\), satisfies the following system

$$\begin{aligned} \frac{\partial }{\partial t} p_{t}(d)= & {} \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_t\big [\gamma _{s}^{r_{si}-n}(C)\mid d\big ]p_t(d)\right) \nonumber \\ p_t(d) \frac{\partial }{\partial t } {\mathsf {E}}_t\big [C_m\mid d\big ]= & {} \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_t\big [C_m \gamma _{s}^{r_{si}-n}(C)\mid d\big ] p_t(d) \right) \nonumber \\&+ \displaystyle \sum _{j \in {\mathscr {C}}} \kappa _j \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _s^{n}(d) {\mathsf {E}}_t\big [\gamma _{s}^{r_{sj}-n}(C) \delta _{jm}(C)\mid d\big ]p_t(d) \nonumber \\&-\, {\mathsf {E}}_t\big [C_m\mid d\big ] \frac{\partial }{\partial t } p_t(d) \nonumber \\ p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}\big [{\tilde{C}}^{M}\mid d\big ]= & {} \displaystyle \sum _{i \in {\mathscr {D}}} ( {\mathscr {F}}^{-{\bar{e}}_i} -I) \left( \kappa _i \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{si}} \beta _{s}^{n}(d) {\mathsf {E}}_t\big [{\tilde{C}}^{M} \gamma _{s}^{r_{si}-n}(C)\mid d\big ] p_t(d) \right) \nonumber \\&+ \displaystyle \sum _{j \in {\mathscr {C}}}\kappa _j \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _{s}^{n}(d) {\mathsf {E}}_t\big [M_j\gamma _{s}^{r_{sj}-n}(C) {\tilde{C}}^{M-e_j}\mid d]p_t(d) \nonumber \\&+ \frac{1}{2} \displaystyle \sum _{j \in {\mathscr {C}}} \kappa _j \displaystyle \prod _{s=1}^{M} \displaystyle \sum _{n=0}^{r_{sj}} \beta _s^{n}(d) {\mathsf {E}}_t \big [M_j (M_j{-}1) \gamma _{s}^{r_{sj}{-}n}(C) {\tilde{C}}^{M{-}2e_j} \mid d ] p_t(d) \nonumber \\&-\, \displaystyle \sum _{j \in {\mathscr {C}}}M_j {\mathsf {E}}_{t}\big [{\tilde{C}}^{M-e_j}\mid d\big ] p_t(d) \frac{\partial }{\partial t }{\mathsf {E}}_t\big [C_j\mid d\big ]\nonumber \\&-\, {\mathsf {E}}_t\big [{\tilde{C}}^{M}\mid d\big ] \frac{\partial }{\partial t} p_t(d), \end{aligned}$$
(5.4)

where \(\delta _{jm}\) is kronecker delta function. Also, \({\mathscr {F}}^{{\bar{e}}_i}\) is a one step operator as follows:

$$\begin{aligned} {\mathscr {F}}^{{\bar{e}}_i} \Big ( f(d) g(c)\Big )= f(d+ {\bar{e}}_i) g(c), i \in {\mathscr {D}}. \end{aligned}$$

where \(f(d): {\mathbf {D}} \rightarrow {\mathbb {R}}\) and \(g(c):{\mathbb {R}}_{\ge 0}^{R-L} \rightarrow {\mathbb {R}} \) be any functions of \(d\) and \(c\) variables, respectively.

In the following section, we will explain the details of the maximum entropy method which will be used to construct the conditional probability distribution \(p_t(c \mid d)\).

6 Maximum entropy

Assume that we want to obtain the solution of the HME under consideration at a specific time point \(\tau >0\). Solving the ODE system in (5.4) gives \(p_{\tau }(d) \), \({\mathsf {E}}_{\tau }[C_m \mid d] \), \( {\mathsf {E}}_{\tau } [{\tilde{C}}^{M} \mid d]\), \(m \in {\mathscr {C}}\), \(M=(M_1,M_2,\ldots ,M_{R-L})^{T} \in {\mathbb {N}}^{R-L}\) values for the system of interest. Although the marginal probabilities, \(p_{\tau }(d)\), can directly be obtained from the ODE system, we still do not know the corresponding conditional probability density function, \(p_{\tau }(c \mid d)\), which will be used to construct the joint probability, \(p_{\tau }(d,c)\), solving the corresponding HME.

To estimate the unknown conditional probability density functions using its moments, we will use the maximum entropy approach proposed by Jaynes [28]. Assume that we have a state space \( \varOmega ={\mathbf {D}} \times {\mathbb {R}}_{\ge 0}^{R-L}\) and our goal is to estimate the unknown probability density function \( p_{\tau }: \varOmega \rightarrow {\mathbb {R}}_{\ge 0}\). Let

$$\begin{aligned} {\mathscr {S}}_{\tau }^{M}=\displaystyle \int _{{\mathbb {R}}_{\ge 0}^{R-L}} \prod _{j \in {\mathscr {C}}} c_{j}^{M_j} p_{\tau }(c \mid d)dc, \quad M=(M_1,M_2,\ldots , M_{R-L}) \in {\mathbb {N}}^{R-L}, \end{aligned}$$

denote the moments of the joint probability density function at time point \(\tau \). It should be noted that when \(M=e_m\), we obtain \({\mathsf {E}}_{\tau } [C_{m} \mid d]\). To guarantee that \(p_{\tau }(c \mid d)\) is a probability function, we must impose the condition \({\mathscr {S}}_{\tau }^{0}=1\). Then, the approximation for the conditional probability density function \(p_{\tau }(c\mid d)\) will be obtained solving the following constrained convex optimization problem

$$\begin{aligned} \mathrm {Minimize} \quad&\displaystyle \int _{ {\mathbb {R}}^{R-L}_{\ge 0}} p_{\tau }{(c \mid d) } \ln ( p_{\tau } (c \mid d))dc \\ \mathrm {Subject \, to} \quad {\mathscr {S}}_{\tau }^{0}= & {} \displaystyle \int _{ {\mathbb {R}}^{R-L}_{\ge 0}} p_{\tau }{(c \mid d) } =1 \\ {\mathscr {S}}_{\tau }^{e_m}= & {} {\mathsf {E}}_{\tau }[C_m \mid d] =\displaystyle \int _{ {\mathbb {R}}^{R-L}_{\ge 0}} c_m p_{\tau }{(c \mid d) } \\ {\mathscr {S}}_{\tau }^{M}= & {} {\mathsf {E}}_{\tau }[C^M \mid d] =\displaystyle \int _{ {\mathbb {R}}^{R-L}_{\ge 0}} \displaystyle \prod _{ j \in {\mathscr {C}}}c_j^{M_j} p_{\tau }{(c \mid d) } \end{aligned}$$

Let \(N\) be the number of moment constraints and \(M^k\), \(k=0,1,\ldots , N\), denote different choices of vectors \(M^k=(M_{1}^{k},M_{2}^{k}, \ldots , M_{R-L}^{k} )\in {\mathbb {N}}^{R-L}\). To impose the conditions given above, we will have \(M^0=0\), \(M^{j}=e_j\), \(j=1,2,\ldots , R-L\). Then, the solution of this constrained optimization problem can be obtained maximizing the following Lagrange function

$$\begin{aligned} {\mathscr {L}}(p_{\tau }(c\mid d), \lambda (\tau ))= & {} - \displaystyle \int _{{\mathbb {R}}^{R-L}_{\ge 0}} p_{\tau }{(c \mid d) } \ln ( p_{\tau } (c \mid d))dc \\&+ \displaystyle \sum _{k=0}^{N} \lambda _{k}(\tau ) \left( \displaystyle \int _{{\mathbb {R}}^{R-L}_{\ge 0}} \displaystyle \prod _{ j \in {\mathscr {C}}}c_j^{M_{j}^{k}} p_{\tau }{(c \mid d) }dc -{\mathscr {S}}_{\tau }^{M^k}\right) , \end{aligned}$$

where \(\lambda _k \in {\mathbb {R}}\), \(k=1,2,\ldots ,N\) are referred to as Lagrange multipliers. Taking the derivative of \({\mathscr {L}}(p_{\tau }(c\mid d), \lambda (\tau ))\) with respect to \(p_{\tau }(c \mid d)\) will give the approximate solution of the conditional probability density for \(p_{\tau }(c \mid d)\) in the following form

$$\begin{aligned} p_{\tau }^{*}(c \mid d)=\arg \max ({\mathscr {L}}(p_{\tau }(c\mid d), \lambda (\tau )))=\frac{1}{Z(N,\lambda (\tau ))} \exp \left( - \displaystyle \sum _{k=0}^{N} \lambda _k(\tau ) \prod _{ j \in {\mathscr {C}}}c_j^{M_{j}^{k}}\right) , \end{aligned}$$

where \(Z(N,\lambda (\tau ))\) is a normalization constant [1, 4, 5]. Now, we can obtain the approximate solution of the joint probability density function which solves the HME under consideration by multiplying the obtained conditional probability function \( p_{\tau }^{*}(c \mid d)\) with the marginal probability function \(p_{\tau }(d)\).

7 Application

In the following we will apply our framework to a simple, illustrative reaction system. We will consider two reactions of the form

$$\begin{aligned} R_{1} : \eta S_1{\mathop {\longrightarrow }\limits ^{\kappa _{1}}} \eta S_2, \quad R_{2}: S_2 {\mathop {\longrightarrow }\limits ^{\kappa _{2}}} S_1 \end{aligned}$$

with \(\eta \in {\mathbb {N}}\) and \(\eta >1\). Hence, conversion of \(S_1\) into \(S_2\) appear in infrequent bursts of size \(\eta \), while the reverse reaction happens in frequent single conversions. The latter reaction may thus be amenable to a diffusion approximation. The state vector of the system at time \(t \ge 0\) is defined by \(Y(t)=(Y_1(t),Y_2(t))^{T} \in {\mathbb {Z}}_{ \ge 0}^{2}\), where \(Y_i(t)\) denote the number of molecules of species \(S_i\), \(i=1,2\).

The joint probability mass function, \(p_t(d,c) \) over both reaction counters satisfies the following CME

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_t(d,c)= & {} \tilde{\alpha }_1(d-1,c) p_t(d-1,c)- \tilde{\alpha }_1(d,c) p_t(d,c) \nonumber \\+ & {} \tilde{\alpha }_2(d,c-1)p_t(d,c-1) - \tilde{\alpha }_2(d,c)p_t(d,c). \end{aligned}$$
(7.1)

For the sake of simplicity and in order not to introduce another approximation step due to nonlinear propensity, we deviate from mass-action and assume linear propensities, i.e., affine in the counting variables

$$\begin{aligned} {\tilde{\alpha }}_1(d,c)= \kappa _1(\beta _1(d)+\gamma _1(c)), \, {\tilde{\alpha }}_2(d,c)= \kappa _2(\beta _2(d)+\gamma _2(c)), \end{aligned}$$

with

$$\begin{aligned} \beta _1(d)=y_1(0)-\eta d, \, \gamma _1(c)=c, \, \beta _2(d)=y_2(0)+\eta d, \gamma _2(c)=-c . \end{aligned}$$

According to the time scale separation mentioned above we now separate reactions and stoichiometric vectors as follows

$$\begin{aligned} {\mathscr {D}}=\{1\}, \, {\mathscr {C}}=\{2\},\, \mu _1^{D}=(-\eta ,\eta )^{T},\, \mu _2^{C}=(1,-1)^{T}. \end{aligned}$$

Then, the HME for the joint probability density function, \(p_t(d,c) \), is defined as :

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_t(d,c)= & {} \kappa _1(y_1(0)-\eta (d-1)+c) p_t(d-1,c)-\kappa _1(y_1(0)-\eta d+c) p_t(d,c)\nonumber \\&- \displaystyle \frac{\partial }{\partial c} \Big ( \kappa _2(y_2(0)+\eta d-c) p_t(d,c)\Big )+ \displaystyle \frac{1}{2}\displaystyle \frac{\partial ^2}{\partial c^2} \Big ( \kappa _2(y_2(0)+\eta d-c) p_t(d,c) \Big )\nonumber \\ \end{aligned}$$
(7.2)

The system of differential equation defining the marginal probabilities, the conditional means and the centered conditional moments has the following form (which we illustrate for the remainder of this section for \(\eta =2\))

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_{t}(d)= & {} \Big (\kappa _1(y_1(0)-2(d-1)) + \kappa _1 {\mathsf {E}}_t[C\mid d-1] \Big ) p_t(d-1) \nonumber \\&-\Big (\kappa _1(y_1(0)-2d)- \kappa _1 {\mathsf {E}}_t[C\mid d] \Big ) p_t(d) \nonumber \\ p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[C\mid d]= & {} \kappa _1(y_1(0)-2(d-1)) {\mathsf {E}}_t[C\mid d-1] p_t(d-1) \nonumber \\&- \,\kappa _1(y_1(0)-2d) {\mathsf {E}}_t[C\mid d] p_t(d) + \kappa _1 {\mathsf {E}}_t[C^2\mid d-1] p_t(d-1) \nonumber \\&-\,\kappa _1 {\mathsf {E}}_t[C^2\mid d] p_t(d) + \kappa _2(y_2(0)+2d) p_t(d) \nonumber \\&-\, \kappa _2 {\mathsf {E}}_t[C\mid d] p_t(d) \nonumber \\&-\,{\mathsf {E}}_t[C\mid d] \displaystyle \frac{\partial }{\partial t} p_{t}(d)\nonumber \\ p_t(d)\displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[{\tilde{C}}^{2}\mid d]= & {} \kappa _1(y_1(0)-2(d-1)) {\mathsf {E}}_t[{\tilde{C}}^2\mid d-1]p_t(d-1) \nonumber \\&-\,\kappa _1(y_1(0)-2d) {\mathsf {E}}_t[{\tilde{C}}^2\mid d]p_t(d) \nonumber \\&+\,\kappa _1 {\mathsf {E}}_t [{\tilde{C}}^2 C\mid d-1] p_t(d-1)\nonumber \\&-\,\kappa _1 {\mathsf {E}}_t [{\tilde{C}}^2 C\mid d] p_t(d)\nonumber \\&+\, \kappa _2(y_2(0)+2d) p_t(d) \nonumber \\&-\,\kappa _2 {\mathsf {E}}_t [C\mid d] p_t(d)-2 \kappa _2 {\mathsf {E}}_t[{\tilde{C}}C\mid d]p_t(d)\nonumber \\&-\, {\mathsf {E}}_t[{\tilde{C}}^{2}\mid d] \displaystyle \frac{\partial }{\partial t} p_{t}(d). \end{aligned}$$
(7.3)

Based on our previous discussions, we get the following system of differential equation which will be referred to as the moment equation system of the HME in the rest of the study

$$\begin{aligned} \displaystyle \frac{\partial }{\partial t} p_{t}(d)= & {} \Big (\kappa _1(y_1(0)-2(d-1)) + \kappa _1 {\mathsf {E}}_t[C\mid d-1] \Big ) p_t(d-1) \nonumber \\&-\Big (\kappa _1(y_1(0)-2d)- \kappa _1 {\mathsf {E}}_t[C\mid d] \Big ) p_t(d) \end{aligned}$$
(7.4)
$$\begin{aligned} p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[C\mid d]= & {} \kappa _1(y_1(0)-2(d-1)) {\mathsf {E}}_t[C\mid d-1] p_t(d-1)\nonumber \\&-\, \kappa _1(y_1(0)-2d) {\mathsf {E}}_t[C\mid d] p_t(d) \nonumber \\&+\, \kappa _1 {\mathsf {E}}_t[{\tilde{\varPsi }}^2\mid d-1] p_t(d-1)+ \kappa _1 ({\mathsf {E}}_t[C\mid d-1])^{2} p_t(d) \nonumber \\&- \, \kappa _1 {\mathsf {E}}_t[{\tilde{C}}^2\mid d] p_t(d) - \kappa _1 ({\mathsf {E}}_t[C\mid d])^{2} p_t(d) \nonumber \\&+\, \kappa _2(y_2(0)+2d) p_t(d)\nonumber \\&-\, \kappa _2 {\mathsf {E}}_t[C\mid d] p_t(d)-{\mathsf {E}}_t[C\mid d] \displaystyle \frac{\partial }{\partial t} p_{t}(d) \end{aligned}$$
(7.5)
$$\begin{aligned} p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[{\tilde{C}}^{2}\mid d]= & {} \Big ({\mathsf {E}}_t[{\tilde{\varPsi }}^2\mid d-1]+ \{{\mathsf {E}}_t[C\mid d-1]-{\mathsf {E}}_t[C\mid d] \}^2 \Big ) \kappa _1(y_1(0)\nonumber \\&-\,2(d-1)) p_t(d-1) \nonumber \\&-\, \kappa _1(y_1(0)-2d) {\mathsf {E}}_t[{\tilde{C}}^2\mid d]p_t(d) \nonumber \\&+\,\kappa _1 {\mathsf {E}}_t[C\mid d-1] {\mathsf {E}}_t[{\tilde{\varPsi }}^2\mid d-1] p_t(d-1) \nonumber \\&+\, 2 \kappa _1 \{{\mathsf {E}}_t[C\mid d-1]-{\mathsf {E}}_t[C\mid d] \} {\mathsf {E}}_t[{\tilde{\varPsi }}^2\mid d-1] p_t(d-1)\nonumber \\&+ \, \kappa _1 \{{\mathsf {E}}_t[C\mid d-1]-{\mathsf {E}}_t[C\mid d] \}^2 {\mathsf {E}}_t[C \mid d-1] p_t(d-1)\nonumber \\&-\, \kappa _1 {\mathsf {E}}_t[C\mid d] {\mathsf {E}}_t[{\tilde{C}}^2\mid d] p_t(d)\nonumber \\&+\, \kappa _2(y_2(0)+ 2d) p_t(d)-\kappa _2 {\mathsf {E}}_t[C\mid d]p_t(d)\nonumber \\&-\,2\kappa _2 {\mathsf {E}}_t[{\tilde{C}}^2\mid d] p_t(d) - {\mathsf {E}}_t[{\tilde{C}}^{2}\mid d] \displaystyle \frac{\partial }{\partial t} p_{t}(d), \end{aligned}$$
(7.6)

where \( {\tilde{\varPsi }}=c-{\mathsf {E}}_t[C \mid d -1]\).

Substitution \(\displaystyle \frac{\partial }{\partial t} p_{t}(d)\) into \(p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[C\mid d]\), \( p_t(d) \displaystyle \frac{\partial }{\partial t} {\mathsf {E}}_t[{\tilde{C}}^{2}\mid d]\) will give a system of differential equations that is expressed only in terms of the marginal probabilities, the conditional means and the second centered moments. In our application, we close moment equations setting the third and the higher moments to zero. If \( p_t(d)=0\) , then we will not be able to obtain \({\mathsf {E}}_t[C \mid d], {\mathsf {E}}_t[{\tilde{C}}^{2}\mid d] \). To avoid this drawback, in [25], the authors proposed a successful initialization procedure.

Based on the fact that propensity functions must be non negative, we define the state space of the system as follows:

$$\begin{aligned} \varOmega = {\mathbf {D}} \times {\mathbf {C}}=\{(d,c) \in {\mathbf {D}} \times {\mathbb {R}}_{\ge 0}: y_1(0)-2d+c \ge 0, y_2(0)+2d-c \ge 0, {\mathbf {D}} \subset {\mathbb {N}}\}. \end{aligned}$$

To obtain each conditional probability density function by solving the corresponding convex optimization problem on the state space of interest, we use the CVX toolbox of the MATLAB [23]. When the size of \(\varOmega \) is very high, the dimensionality of the optimization problem increases. Therefore, the CVX cannot produce accurate results.

To keep the dimension of the optimization problems small for the CVX, we construct state space iteratively using a similar strategy to the sliding window method [40]. In summary, our strategy is to solve the moment equation system of the HME using an appropriate discretization method. At each discretization step, we check the marginal probabilities. If they are higher than a given threshold, then we extend the state space of the variable \(d\). This procedure continues until the time point of the interest is reached. Finally, depending on the state space of the variable \(d\), we construct a state space for the variable \(c\). Now, we can explain the details of the method.

In the first step of this construction, we define a feasible subset \(\varOmega ^0\) of \(\varOmega \), \(\varOmega ^0={\mathbf {D}}^0 \times {\mathbf {C}}^0\), in which the dimension of the optimization problem is acceptable for the CVX. To avoid the problem of having \(p_0(d)=0\), we choose an initial Poisson distribution, \(p_0(d,c)\), in the state space \(\varOmega ^0\) and compute the corresponding \(p_0(d)\), \({\mathsf {E}}_0[C \mid d]\), \({\mathsf {E}}_0[{\tilde{C}}^2 \mid d]\), which will be considered as the initial conditions for the moment equation of the system of the HME.

Assume that we want to obtain the conditional counting probability density at time point \(\tau >0\). Then, we approximate the solution of the moment equation system of the HME on \([0, \tau ]\) time interval using a numerical method. We choose a discretization time step \(\varDelta \) and define \(t_j=j\varDelta \), \(j=0,1,\ldots , J\) such that \(t_0=0\), \(t_J=\tau \). As a result, we obtain subintervals \([t_j, t_{j+1}]\), \(j=0,1,\ldots , J-1\). Let \(p^{j}(d)\), \({\mathsf {E}}^{j}[C \mid d]\), \({\mathsf {E}}^{j}[{\tilde{C}}^2 \mid d]\) represent the approximate solution of the ODE system given in Eq. (7.5) and \({\mathbf {D}}^{j}\) represents the state space of \(d\) at time point \(t_j\). To construct \({\mathbf {D}}^1\), we will solve the moment equation system of the HME using initial conditions \(p^{0}(d) \equiv p_0(d)\), \( {\mathsf {E}}^{0} [C\mid d] \equiv {\mathsf {E}}_{0} [C\mid d] \), \( {\mathsf {E}}^{0} [{\tilde{C}}^2\mid d] \equiv {\mathsf {E}}_{0} [{\tilde{C}}^2\mid d] \). Then, we will obtain \(p^{1}(d)\), \({\mathsf {E}}^{1}[C\mid d]\), \({\mathsf {E}}^{1}[{\tilde{C}}^2\mid d]\) values for each \(d\) variable in the state space

$$\begin{aligned} {\mathbf {D}}^{0}=\{d: \min ({\mathbf {D}}^{0}) \le d \le \max ({\mathbf {D}}^{0})\}. \end{aligned}$$

To extend \( {\mathbf {D}}^{0}\), we define a threshold \( \varepsilon >0 \) and check the marginal probability \(p^{1}_{max} \equiv p^{1}(max({\mathbf {D}}^{0}))\). If \(p^{1}_{max}> \varepsilon \), then we extend \({\mathbf {D}}^0\) as follows:

$$\begin{aligned} {\mathbf {D}}^{1}=\{d: \min ({\mathbf {D}}^{0}) \le d \le \max ({\mathbf {D}}^{0})+1\}. \end{aligned}$$

To approximate the solution of Eq. (7.5) at time point \(t_2\), we need to initialize the system on \({\mathbf {D}}^{1}\). Although we know \(p^1(d)\), \({\mathsf {E}}^{1}[C\mid d]\), \({\mathsf {E}}^{1}[{\tilde{C}}^2\mid d]\) for \(d \in {\mathbf {D}}^{0}\), we have to impose initial conditions for \(d=\max ({\mathbf {D}}^{0})+1\)

$$\begin{aligned} p^1( \max (\mathbf {D}^{0})+1 )= & {} \frac{\displaystyle \sum _{d \in \mathbf {D}^0} p^1(d) }{\mid \mathbf {D}^0 \mid +1}, \quad \mathsf {E}^{1}[C \mid \max (\mathbf {D}^{0})+1 ]= \mathsf {E}^{1}[ C \mid \max (\mathbf {D}^{0})], \\ \mathsf {E}^{1}[\tilde{C}^2 \mid \max (\mathbf {D}^{0})+1]= & {} \mathsf {E}^{1}[ \tilde{C}^2 \mid \max (\mathbf {D}^{0})], \end{aligned}$$

where \(\mid {\mathbf {D}}^0 \mid \) denotes the cardinality of the subset \({\mathbf {D}}^0 \). We employ this procedure successively until the desired time point \(\tau \) is reached. Let \({\mathbf {D}}^{*}\) denote the state space of \(d\) at time point \(\tau \). Here, we must choose \(\varepsilon >0\) such that, \({\mathbf {D}}^{*}\) must also be in the feasible region of the CVX. Now, we can construct the feasible state space for \(c\) denoted by \({\mathbf {C}}^{*}\). Since we know initial domain \(\varOmega ^0={\mathbf {D}}^{0} \times {\mathbf {C}}^{0} \), we only need to obtain \((d,c)\) pairs for \(d \in {\mathbf {D}}^{*} {\setminus } {\mathbf {D}}^{0}\). Then, for a given \(\epsilon >0\), we construct the feasible region \({\mathbf {C}}^{*}\) for variable \(c\) as follows:

$$\begin{aligned} {\mathbf {C}}^{*}={\mathbf {C}}^{0} \cup \bar{{\mathbf {C}}} \quad \text{ with } \quad \bar{{\mathbf {C}}}= \displaystyle \bigcup _{d \in {\mathbf {D}}^{*} {\setminus } {\mathbf {D}}^{0} } {\mathbf {C}}_d, \end{aligned}$$

where

$$\begin{aligned} {\mathbf {C}}_d= & {} \Big \{c: max(min({\mathbf {C}}^{0})- \epsilon \sigma , 0) \le c \le max({\mathbf {C}}^{0})+ \epsilon \sigma ) \, \wedge y_1(0)-2d+c \ge 0 \, \\&\quad \wedge y_2(0)+2d-c \ge 0 \wedge d \in {\mathbf {D}}^{*} {\setminus } {\mathbf {D}}^{0} \Big \}, \end{aligned}$$

where \(\sigma =\sqrt{{\mathsf {E}}^{J}[{\tilde{C}}^2 \mid d]}\). Here \(max({\mathbf {C}}^{0})\) and \(min({\mathbf {C}}^{0})\) denote the maximum and the minimum values of \(c\) of pairs \((c, max({\mathbf {D}}^{0})) \in \varOmega ^{0}\),respectively. As a result, we have a feasible region \(\varOmega ^{*}= {\mathbf {D}}^{*} \times {\mathbf {C}}^{*} \) for the CVX. Then, we can solve the corresponding convex optimization problems for each conditional counting probability density \(p_{\tau }(c \mid d), d \in {\mathbf {D}}^{*}\) using the CVX. Finally, we can compute the approximate solution of \(p_{\tau }(d,c)\). The resulting algorithm is presented in Algorithm 1.

figure a

In our numerical simulation study, the state of the system is initialized \(y(0)=(50,0)^{T}\) and the reaction rate constants of \(R_1\), \(R_2\) are given by \( \kappa _1=0.2 \mathrm {s}^{-1}\), \( \kappa _2=0.4 \mathrm {s}^{-1}\), respectively. We define

$$\begin{aligned} \varOmega = \{ (d,c) : 50-2d+c \ge 0, \, 2d-c \ge 0, \, d,c \in \{0,1,2,\ldots ,30\} \}. \end{aligned}$$

The initial state space \(\varOmega ^{0}\) is

$$\begin{aligned} \varOmega ^{0}=\{(d,c) \in \varOmega : \, (d,c)\in \{0,1,2,\ldots ,8\} \}. \end{aligned}$$
Fig. 1
figure 1

The joint counting probability density function at time point \(\tau =0.5\). a The state space \(\varOmega \) is shown by the points only with green markers; \(\varOmega ^{0}\) is shown by the points with black edges; and \(\varOmega ^{*}\) is the union of the points with black and red edges. b The joint counting probability density function satisfying the CME given in Eq. (7.1). c The joint counting probability density function satisfying the HME given in Eq. (7.2)

Fig. 2
figure 2

The joint counting probability density function at time point \(\tau =1\). a The state space \(\varOmega \) is shown by the points only with green markers; \(\varOmega ^{0}\) is shown by the points with black edges; and \(\varOmega ^{*}\) is the union of the points with black and red edges. b The joint counting probability density function satisfying the CME given in Eq. (7.1). cThe joint counting probability density function satisfying the HME given in Eq. (7.2)

Table 1 The errors between the joint probability distributions, marginal distributions, centered conditional means and the second centered conditional moments of the CME satisfying Eq. (7.1) and HME obtained by the Algorithm 1

In Figs. 1a and 2a, one can see the state space \(\varOmega \) shown by the points with only green markers and \(\varOmega ^{0}\) shown by the points with black edged markers. The threshold for extending the region of the variable \(d\) is \(\varepsilon = 10^{-6}\), and \(\epsilon =2\). We obtain joint counting probability density function at time points \(\tau =0.5\) and \(\tau =1\). We have used the Euler method with fixed time step \(\varDelta =10^{-4}\). Figures 1a and 2a also show the \(\varOmega ^{*}\) at time points \(\tau =0.5\) and \(\tau =1\), respectively. In both figures, the state space \(\varOmega ^{*}\) is the union of the points denoted by markers with black and red edges. Figures 1b and 2b show the joint counting probability satisfying the CME given in Eq. (7.1) at time points \(\tau =0.5\) and \(\tau =1\), respectively. To obtain these figures, we solve CME in this given restricted domain. To achieve this goal, we obtain the ODE system satisfying the CME and use Euler method with fixed time step \(\varDelta =10^{-4}\).

Figures 1c and 2c indicate the approximate solution of the \(p_{\tau }(d,c)\) satisfying Eq. (7.2) obtained with Algorithm 1 at time points \(\tau =0.5\) and \(\tau =1\), respectively.

In Table 1, one can see two norm of the differences between the joint probabilities, marginal probabilities, conditional means and second central moments obtained from CME given in the Eq. (7.1) and from the moment equation of the HME given in Eq. (7.4). We should note that the error between the joint probability distributions decresases when the size of the domain of the HME increases. On the other hand, the errors in the marginal probabilities and centered means increases as the size of the domain of the HME increases, this is the result of only taking the evolution equations of the centered moments up to order \(2\).

8 Conclusion

In this study, we present the hybrid master equation for jump-diffusion approximation, which models systems with multi-scale nature. The idea of jump diffusion approximation is to separate reactions into fast and slow groups based on an obtained error bound. Fast reactions are modeled using diffusion approximation, while Markov chain representation is employed for slow reactions. In this study, based on the study of Pawula [34], we prove that joint probability density of this hybrid model over reaction counting process satisfies the hybrid master equation, which is the summation of the corresponding chemical master equation and the Fokker–Planck equation. It can be said that while [17] presents a state vector representation for reaction networks with multi-scale nature, the current study complements it by obtaining evolution equation for the corresponding joint probability density over reaction counting process. We note that the main contribution of the paper is to prove that the joint probability density function of jump-diffusion approximation satisfies the HME that correspondingly involves terms from the chemical master equation modeling the reaction counters of the slow reactions and from the Fokker–Planck Equation modeling the reaction counters of fast reactions given the reaction counters of slow reactions. Although different solution procedures such as finite difference method or finite element method [35] can be used to approximate the solution of the corresponding HME, we propose an algorithm based on the method of conditional methods [25]. In our algorithm, we write the joint probability density function as the product of the conditional counting probability density of the fast reactions conditioned on the counting process of the slow reactions and the marginal probability of the counting process of the slow reactions. To construct the conditional probability density functions at a specific time point, we used the maximum entropy approach. We use the CVX toolbox of the MATLAB to solve the constrained optimization problems. Based on restrictions of the CVX on the dimensionality of the optimization problems, we present a method which constructs feasible regions for the CVX. Although we consider that the main contribution of the paper is the derivation of the HME, there are two open questions in the proposed simulation study which obtains an approximation for the solution of the HME. The first question is about the application area of the proposed algorithm. In the present paper, we have applied the algorithm to a multi-scale conversion process which is a very small system. In our application, we have used zero or first order mass action rate laws to utilize the linearity of the conditional means. Although there is no restriction on propensity functions to obtain evolution equations for the conditional means and centered conditional moments, it is not easy to implement the proposed algorithm to systems involving nonlinear propensity functions. The second question is about the extension of the state space when the size of the system of interest is too large. In our application, we have one slow and one fast reaction. As a result, after solving the moment equation system of the HME on the feasible region which can be acceptable for the CVX, we can extend the state space of the discrete variable by adding 1 to the initial feasible state space of the discrete variable. Based on the new extended state space of the discrete variable, we can extend the state space of continuous variable. When we have more than one slow reaction in the system, the discrete variable will be a vector and we will not be able to extend the feasible region which in turn will cause problems for extension of the feasible region of the state space of the continuous variable. We will try to answer these two open questions in our future works.