A large-scale assessment of exact lumping of quantitative models in the BioModels repository
Introduction
Computational models in systems biology combine biochemical and physiological knowledge to inform highly detailed mechanistic models of biological networks such as signaling pathways, protein-protein interaction networks, and genetic cascades. Mathematical models which equip such interaction networks with kinetic information generally lead to a dynamical-system representation in terms of a formal chemical reaction network (CRN),1 with two main interpretations based on ordinary differential equations (ODEs) and continuous-time Markov chains (CTMCs), respectively. In either case the model tracks the time-course evolution of all biochemical species in the network. In the ODE interpretation each species is associated with a variable of a system of (typically nonlinear) ODEs, which are analyzed from an initial condition that represents the initial concentration of each species [55]. In the CTMC interpretation [28], species are tracked discretely and each state is a vector of molecular counts, one component for each species. It is well known that these two representations can be formally related to each other under appropriate conditions, with the ODEs being the limit behavior when the number of molecules in the CRN is large enough [36]. Often it is useful to consider both interpretations—one would take the CTMC semantics as the ground-truth representation and the ODE as an approximation that estimates the first-order moments of the species' populations.
Unfortunately, in both cases the analysis can be expensive due to the lack of analytical solutions in general. Indeed, the modeler is typically left with computational approaches such as the numerical integration of ODEs (e.g., [3]) or stochastic simulation [28]. This is a major motivating issue for several lines of research aiming at easing the computational cost of the analysis, including efficient simulation methods (e.g., [29]) and approximation methods for stochastic chemical kinetics (e.g., [49]). Model reduction is a major line of research concerned with lowering the computational cost of the analysis of dynamical models. Under this term belong several techniques which exploit various properties such as time-scale separation (i.e., by identifying reactions evolving at considerably different rates), lumping (i.e., grouping variables into macro-variables for which it is possible to construct a coarse-grained model), and sensitivity analysis (i.e., by removing variables that have little influence on the dynamics of interest to the modeler). For a thorough review in the field of computational biology, the reader is referred to [50], and to [44] for a classical earlier review in chemical engineering. It is worth noting that the problem of model reduction transcends chemistry and biology, and is of relevance in many other computational sciences; see, for instance, the book by Antoulas for applications to control theory [1].
For the purposes of the present paper it is useful to distinguish between exact and approximate model reduction methods. The former class identifies a reduced model which preserves an aggregate dynamics of the original model precisely, i.e., without any approximation error [40], [53]. In other words, the reduction induces some loss of information, but the information which is kept remains exact. The latter class gives aggregate dynamics with some approximation error, whose degree of controllability depends on the specific technique under consideration. Exact model reduction methods can be seen as a universal pre-processing step for computational analysis because they do not add further inaccuracy beyond the inevitable disagreement between a model and the real (biological) system under study. On the other hand, they may require stricter criteria to guarantee exactness, which may limit their applicability in practice.
The main goal of this paper is to carry out a systematic evaluation on benchmark biological models of a class of methods for the exact reduction of a CRN. These methods are based on the computation of equivalence relations between its species. Here the original set of species is partitioned into blocks such that every original species belongs to exactly one block. The simplification consists in defining an aggregate CRN where each macro-species represents the overall dynamics (i.e., the sum of the original dynamics) of the species in each block. According to established terminology, such an equivalence relation identifies a so-called proper lumping [44], whereby every variable is guaranteed to contribute to precisely one of the macro-variables. We remark that, although methods concerned with improper lumping are available (e.g., [40], [45], they are outside the scope of the present contribution.
We consider five different techniques [10], [11], [13], [18], [17]. These are all defined as equivalence relations which share a common property, namely the possibility of developing an algorithm for computing the coarsest lumping using partition refinement, based on the seminal algorithm in [46]. Such algorithms apply iterative refinements to a given initial partition of species with which, for instance, one can isolate the observable species to be preserved in the reduction. The techniques differ according to the dynamical interpretation that they consider (i.e., deterministic versus stochastic), the assumption on the structure of the input CRN, and the dynamical property that they preserve.
- •
Forward equivalence (FE) and backward equivalence (BE) apply to CRNs with deterministic semantics based on ODEs with polynomial derivatives (including mass-action kinetics as a notable instance), and lead to reduced models where the solution of each macro-species exactly corresponds to the sums of the solutions of the original variables belonging to a block [13].
- –
With FE the reduced model can be used for any choice of the initial concentrations of the original model, but the time course of the concentration of one original species cannot be recovered from the reduced model in general.
- –
BE aggregates species that have the same solutions at all time points, hence an individual solution may be recovered by simply dividing the solution of the corresponding aggregate by the cardinality of the equivalence class; however the dynamics of the reduced model can be related to that of the original one only if all species in the same partition block start with the same initial concentration in the original model.
- –
- •
Forward differential equivalence (FDE) and backward differential equivalence (BDE) are generalizations of FE and BE, respectively, that can be applied to CRNs where the underlying ODEs have nonlinearities beyond polynomials, e.g., rational expressions as in Hill kinetics [11].
- •
Species equivalence (SE) is an equivalence relation for mass-action CRNs with CTMC interpretation. It identifies a reduced CRN where the marginal probability distribution of each macro-species gives the marginal probability distribution of the sum of the molecular counts of all species in that block.
Assisted by ERODE [14], a publicly available software tool that implements the aforementioned equivalences, we carry out an assessment of the BioModels database [39], a well-known repository of quantitative models of biochemical systems. The models are available at https://www.erode.eu/examples.html. Our goal is to answer the following three main evaluation questions:
- Q1.
How restrictive are the assumptions required by the considered notions of equivalence? We answer this question by detailing how we translated the BioModels descriptions, available in the SBML format, into the input format of ERODE.
- Q2.
What is the effectiveness of exact model reduction by the equivalence relations? We measure effectiveness as the percentage of models that can be aggregated, as well as the compression ratio provided by the largest equivalence that preserves the observables specified in the original model.
- Q3.
What is the robustness of the reductions with respect to the original parameter values? For this question, we present a more detailed discussion of a selected number of models, for which we also provide a physical interpretation of the obtained reductions.
Relationship with [47]. This paper is based on and extends [47], where
- •
a prototypical version of our SBML importer has been presented;
- •
a less detailed evaluation has been performed on the BioModels repository;
- •
no indication on the robustness of our reduction techniques to specific parameter values has been provided;
- •
a preliminary, now superseded, notion of reduction for the CTMC semantics of MRN has been considered.
Section snippets
Background
In order to make the paper self-contained, in this section we briefly overview the main results regarding the equivalence notions used in our assessment. We refer to the original papers for the details and further examples, while tutorial-like presentations are given in [52], [54].
Overview of the BioModels repository
The BioModels Database is a repository of computational models of biological processes [39]. It hosts dynamical quantitative models described in peer-reviewed scientific literature as well as models generated automatically from pathway resources such as KEGG [34], BioCarta [43], MetaCyc [19], PID [48] and SABIO-RK [57]. BioModels covers a wide range of models from several biological categories such as biochemical reaction systems, kinetic models, metabolic networks, steady-state models and
Reduction results
We hereby report the results of our experiments. Section 4.1 describes the experimental configuration, while Section 4.2 and 4.3 discuss the reduction power of our techniques in terms of percentage of reduced models and obtained reduction ratios, respectively. Section 4.4 concludes the section by comparing the results obtained for the forward and backward reduction notions. Considerations on the actual analysis speedups obtainable thanks to our techniques are out of the scope of this paper,
Robustness of the reduction techniques to parameters values on selected case studies
A CRN with deterministic semantics can be extended into a new one with dynamically equivalent ODEs where the reductions are independent from the actual values of the model parameters. This enables the analysis of the robustness of the reduction techniques, a property that is particularly relevant in computational systems biology, where the parameters are not precisely known in general. The main idea behind this extension is to promote each parameter to a species with no dynamics (i.e., which
Conclusion
The empirical assessment of exact model reduction on the BioModels repository has provided a number of findings along the main evaluation questions Q1–Q3 introduced in Section 1, which can be summarized as follows.
Q1 Assumptions for applicability of model reductions. In the preprocessing phase (Section 3.3), we found 494 models not supported by ERODE. Among the reasons for incompatibility it is worth commenting on the models which included exponential expressions in rate functions. This is not
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
The authors are grateful to Andreas Dräger (Institut für Informatik Zentrum für Bioinformatik Tübingen) for his support with JSBML. Partially supported by the Independent Research Fund Denmark DFF Research Project 9040-00224B REDUCTO, and the PRIN project “SEDUCE” no. 2017TWRCNB.
References (57)
Metabolic stability and epigenesis in randomly constructed genetic nets
J. Theor. Biol.
(1969)- et al.
A general analysis of exact lumping in chemical kinetics
Chem. Eng. Sci.
(1989) - et al.
Ndex, the network data exchange
Cell Syst.
(2015) Approximation of Large-Scale Dynamical Systems
(2005)- et al.
Reducing Boolean networks with backward Boolean equivalence
- et al.
Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations
(1988) - et al.
Gene ontology: tool for the unification of biology
Nat. Genet.
(2000) - et al.
Stat1-cooperative dna binding distinguishes type 1 from type 2 interferon signaling
Nat. Immunol.
(2014) - et al.
A model checking approach to discrete bifurcation analysis
Algebra, coalgebra, and minimization in polynomial differential equations
The statistical mechanics of complex signaling networks: nerve growth factor signaling
Phys. Biol.
Exact and ordinary lumpability in finite Markov chains
J. Appl. Probab.
Forward and backward bisimulations for chemical reaction networks
Symbolic computation of differential equivalences
Efficient syntax-driven lumping of differential equations
Maximal aggregation of polynomial dynamical systems
Proc. Natl. Acad. Sci.
Erode: a tool for the evaluation and reduction of ordinary differential equations
Syntactic Markovian bisimulation for chemical reaction networks
Guaranteed error bounds on approximate model abstractions through reachability analysis
Symbolic computation of differential equivalences
Theor. Comput. Sci.
Exact maximal reduction of stochastic reaction networks by species lumping
Bioinformatics
The metacyc database of metabolic pathways and enzymes and the biocyc collection of pathway/genome databases
Nucleic Acids Res.
A dynamical model of the spindle position checkpoint
Mol. Syst. Biol.
Biomodels database: a repository of mathematical models of biological processes
Uniprot: a hub for protein information
Nucleic Acids Res.
Controlled vocabularies and semantics in systems biology
Mol. Syst. Biol.
Z3: an efficient SMT solver
Chebi: a database and ontology for chemical entities of biological interest
Nucleic Acids Res.
Cited by (4)
Minimization of Dynamical Systems over Monoids
2023, Proceedings - Symposium on Logic in Computer ScienceExact Linear Reduction for Rational Dynamical Systems
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)