Elsevier

Theoretical Computer Science

Volume 893, 21 November 2021, Pages 41-59
Theoretical Computer Science

A large-scale assessment of exact lumping of quantitative models in the BioModels repository

https://doi.org/10.1016/j.tcs.2021.06.026Get rights and content

Highlights

  • We present a systematic analysis of the effectiveness of exact lumping methods on the BioModels repository.

  • Over 64% of the models analyzed can be reduced by at least one of the considered techniques.

  • On selected case studies, the reductions are shown to be independent of the actual values of the model parameters.

Abstract

Chemical reaction networks are a popular formalism for modeling biological processes which supports both a deterministic and a stochastic interpretation based on ordinary differential equations and continuous-time Markov chains, respectively. In most cases, these models do not enjoy analytical solution, thus typically requiring expensive computational methods based on numerical solvers or stochastic simulations. Exact model reduction techniques can be used as an aid to lower the analysis cost by providing reduced networks that preserve the dynamics of interest to the modeler without incurring any approximation error. We hereby consider a family of techniques for both deterministic and stochastic networks which are based on equivalence relations over the species in the network, leading to a coarse graining which provides the exact aggregate time-course evolution for each equivalence class. We present a large-scale empirical assessment on the BioModels repository by measuring their compression capability over 579 models. Through a number of selected case studies, we also show their ability in yielding physically interpretable reductions that can reveal dynamical patterns of the bio-molecular processes under consideration, independently of the values of the kinetic parameters used in the models.

Introduction

Computational models in systems biology combine biochemical and physiological knowledge to inform highly detailed mechanistic models of biological networks such as signaling pathways, protein-protein interaction networks, and genetic cascades. Mathematical models which equip such interaction networks with kinetic information generally lead to a dynamical-system representation in terms of a formal chemical reaction network (CRN),1 with two main interpretations based on ordinary differential equations (ODEs) and continuous-time Markov chains (CTMCs), respectively. In either case the model tracks the time-course evolution of all biochemical species in the network. In the ODE interpretation each species is associated with a variable of a system of (typically nonlinear) ODEs, which are analyzed from an initial condition that represents the initial concentration of each species [55]. In the CTMC interpretation [28], species are tracked discretely and each state is a vector of molecular counts, one component for each species. It is well known that these two representations can be formally related to each other under appropriate conditions, with the ODEs being the limit behavior when the number of molecules in the CRN is large enough [36]. Often it is useful to consider both interpretations—one would take the CTMC semantics as the ground-truth representation and the ODE as an approximation that estimates the first-order moments of the species' populations.

Unfortunately, in both cases the analysis can be expensive due to the lack of analytical solutions in general. Indeed, the modeler is typically left with computational approaches such as the numerical integration of ODEs (e.g., [3]) or stochastic simulation [28]. This is a major motivating issue for several lines of research aiming at easing the computational cost of the analysis, including efficient simulation methods (e.g., [29]) and approximation methods for stochastic chemical kinetics (e.g., [49]). Model reduction is a major line of research concerned with lowering the computational cost of the analysis of dynamical models. Under this term belong several techniques which exploit various properties such as time-scale separation (i.e., by identifying reactions evolving at considerably different rates), lumping (i.e., grouping variables into macro-variables for which it is possible to construct a coarse-grained model), and sensitivity analysis (i.e., by removing variables that have little influence on the dynamics of interest to the modeler). For a thorough review in the field of computational biology, the reader is referred to [50], and to [44] for a classical earlier review in chemical engineering. It is worth noting that the problem of model reduction transcends chemistry and biology, and is of relevance in many other computational sciences; see, for instance, the book by Antoulas for applications to control theory [1].

For the purposes of the present paper it is useful to distinguish between exact and approximate model reduction methods. The former class identifies a reduced model which preserves an aggregate dynamics of the original model precisely, i.e., without any approximation error [40], [53]. In other words, the reduction induces some loss of information, but the information which is kept remains exact. The latter class gives aggregate dynamics with some approximation error, whose degree of controllability depends on the specific technique under consideration. Exact model reduction methods can be seen as a universal pre-processing step for computational analysis because they do not add further inaccuracy beyond the inevitable disagreement between a model and the real (biological) system under study. On the other hand, they may require stricter criteria to guarantee exactness, which may limit their applicability in practice.

The main goal of this paper is to carry out a systematic evaluation on benchmark biological models of a class of methods for the exact reduction of a CRN. These methods are based on the computation of equivalence relations between its species. Here the original set of species is partitioned into blocks such that every original species belongs to exactly one block. The simplification consists in defining an aggregate CRN where each macro-species represents the overall dynamics (i.e., the sum of the original dynamics) of the species in each block. According to established terminology, such an equivalence relation identifies a so-called proper lumping [44], whereby every variable is guaranteed to contribute to precisely one of the macro-variables. We remark that, although methods concerned with improper lumping are available (e.g., [40], [45], they are outside the scope of the present contribution.

We consider five different techniques [10], [11], [13], [18], [17]. These are all defined as equivalence relations which share a common property, namely the possibility of developing an algorithm for computing the coarsest lumping using partition refinement, based on the seminal algorithm in [46]. Such algorithms apply iterative refinements to a given initial partition of species with which, for instance, one can isolate the observable species to be preserved in the reduction. The techniques differ according to the dynamical interpretation that they consider (i.e., deterministic versus stochastic), the assumption on the structure of the input CRN, and the dynamical property that they preserve.

  • Forward equivalence (FE) and backward equivalence (BE) apply to CRNs with deterministic semantics based on ODEs with polynomial derivatives (including mass-action kinetics as a notable instance), and lead to reduced models where the solution of each macro-species exactly corresponds to the sums of the solutions of the original variables belonging to a block [13].

    • With FE the reduced model can be used for any choice of the initial concentrations of the original model, but the time course of the concentration of one original species cannot be recovered from the reduced model in general.

    • BE aggregates species that have the same solutions at all time points, hence an individual solution may be recovered by simply dividing the solution of the corresponding aggregate by the cardinality of the equivalence class; however the dynamics of the reduced model can be related to that of the original one only if all species in the same partition block start with the same initial concentration in the original model.

  • Forward differential equivalence (FDE) and backward differential equivalence (BDE) are generalizations of FE and BE, respectively, that can be applied to CRNs where the underlying ODEs have nonlinearities beyond polynomials, e.g., rational expressions as in Hill kinetics [11].

  • Species equivalence (SE) is an equivalence relation for mass-action CRNs with CTMC interpretation. It identifies a reduced CRN where the marginal probability distribution of each macro-species gives the marginal probability distribution of the sum of the molecular counts of all species in that block.

FDE and BDE (hence, FE and BE) are shown to be conservative generalizations of the notions of ordinary and exact lumpability for CTMCs [9], respectively, when one considers their forward equations of motion, which are a system of linear ODEs. As a consequence of the fact that ordinary and exact lumpability are not comparable—i.e., neither implies the other)—FDE (resp., FE) and BDE (resp., BE) are not comparable either. This will also show in the models analyzed in this paper.

Assisted by ERODE [14], a publicly available software tool that implements the aforementioned equivalences, we carry out an assessment of the BioModels database [39], a well-known repository of quantitative models of biochemical systems. The models are available at https://www.erode.eu/examples.html. Our goal is to answer the following three main evaluation questions:

  • Q1.

    How restrictive are the assumptions required by the considered notions of equivalence? We answer this question by detailing how we translated the BioModels descriptions, available in the SBML format, into the input format of ERODE.

  • Q2.

    What is the effectiveness of exact model reduction by the equivalence relations? We measure effectiveness as the percentage of models that can be aggregated, as well as the compression ratio provided by the largest equivalence that preserves the observables specified in the original model.

  • Q3.

    What is the robustness of the reductions with respect to the original parameter values? For this question, we present a more detailed discussion of a selected number of models, for which we also provide a physical interpretation of the obtained reductions.

Relationship with [47].  This paper is based on and extends [47], where

  • a prototypical version of our SBML importer has been presented;

  • a less detailed evaluation has been performed on the BioModels repository;

  • no indication on the robustness of our reduction techniques to specific parameter values has been provided;

  • a preliminary, now superseded, notion of reduction for the CTMC semantics of MRN has been considered.

Specifically, as discussed in the reworked Section 3 (especially Section 3.2), we improved the quality of our SBML importer and fully integrated it into ERODE, allowing us to automatize all the importing and analysis tasks. Despite the number of supported models has not changed considerably with respect to what reported in [47], these improvements have allowed us to identify more mass-action models (70 rather than 43), and to considerably decrease the number of models on which the reduction techniques failed with errors or timeouts (22 rather than 198). Differently from [47], the analysis is provided separately for the curated and non-curated branches of the BioModels repository. In addition, in this paper we now consider three classes of CRNs instead of two classes in [47] (i.e., with mass-action and non-mass-action kinetics) by adding the class of polynomial reaction networks, that is CRNs with ODEs that have polynomial derivatives but do not obey the mass-action law. Furthermore, a section on selected case studies now shows novel applications of the techniques to identify reductions that are not dependent from the actual values of the model parameters. Finally, SMB [15], the reduction technique for the stochastic semantics of elementary MRNs has been replaced by the novel SE [18]. SE generalizes SMB because it can be applied to any MRN (rather than only to elementary ones with at most two reactants for SMB) and has more reduction power.

Section snippets

Background

In order to make the paper self-contained, in this section we briefly overview the main results regarding the equivalence notions used in our assessment. We refer to the original papers for the details and further examples, while tutorial-like presentations are given in [52], [54].

Overview of the BioModels repository

The BioModels Database is a repository of computational models of biological processes [39]. It hosts dynamical quantitative models described in peer-reviewed scientific literature as well as models generated automatically from pathway resources such as KEGG [34], BioCarta [43], MetaCyc [19], PID [48] and SABIO-RK [57]. BioModels covers a wide range of models from several biological categories such as biochemical reaction systems, kinetic models, metabolic networks, steady-state models and

Reduction results

We hereby report the results of our experiments. Section 4.1 describes the experimental configuration, while Section 4.2 and 4.3 discuss the reduction power of our techniques in terms of percentage of reduced models and obtained reduction ratios, respectively. Section 4.4 concludes the section by comparing the results obtained for the forward and backward reduction notions. Considerations on the actual analysis speedups obtainable thanks to our techniques are out of the scope of this paper,

Robustness of the reduction techniques to parameters values on selected case studies

A CRN with deterministic semantics can be extended into a new one with dynamically equivalent ODEs where the reductions are independent from the actual values of the model parameters. This enables the analysis of the robustness of the reduction techniques, a property that is particularly relevant in computational systems biology, where the parameters are not precisely known in general. The main idea behind this extension is to promote each parameter to a species with no dynamics (i.e., which

Conclusion

The empirical assessment of exact model reduction on the BioModels repository has provided a number of findings along the main evaluation questions Q1Q3 introduced in Section 1, which can be summarized as follows.

Q1 Assumptions for applicability of model reductions.  In the preprocessing phase (Section 3.3), we found 494 models not supported by ERODE. Among the reasons for incompatibility it is worth commenting on the models which included exponential expressions in rate functions. This is not

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors are grateful to Andreas Dräger (Institut für Informatik Zentrum für Bioinformatik Tübingen) for his support with JSBML. Partially supported by the Independent Research Fund Denmark DFF Research Project 9040-00224B REDUCTO, and the PRIN project “SEDUCE” no. 2017TWRCNB.

References (57)

  • S. Kauffman

    Metabolic stability and epigenesis in randomly constructed genetic nets

    J. Theor. Biol.

    (1969)
  • G. Li et al.

    A general analysis of exact lumping in chemical kinetics

    Chem. Eng. Sci.

    (1989)
  • D. Pratt et al.

    Ndex, the network data exchange

    Cell Syst.

    (2015)
  • A. Antoulas

    Approximation of Large-Scale Dynamical Systems

    (2005)
  • G. Argyris et al.

    Reducing Boolean networks with backward Boolean equivalence

  • U.M. Ascher et al.

    Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations

    (1988)
  • M. Ashburner et al.

    Gene ontology: tool for the unification of biology

    Nat. Genet.

    (2000)
  • A. Begitt et al.

    Stat1-cooperative dna binding distinguishes type 1 from type 2 interferon signaling

    Nat. Immunol.

    (2014)
  • N. Benes et al.

    A model checking approach to discrete bifurcation analysis

  • M. Boreale

    Algebra, coalgebra, and minimization in polynomial differential equations

  • K.S. Brown et al.

    The statistical mechanics of complex signaling networks: nerve growth factor signaling

    Phys. Biol.

    (2004)
  • P. Buchholz

    Exact and ordinary lumpability in finite Markov chains

    J. Appl. Probab.

    (1994)
  • L. Cardelli et al.

    Forward and backward bisimulations for chemical reaction networks

  • L. Cardelli et al.

    Symbolic computation of differential equivalences

  • L. Cardelli et al.

    Efficient syntax-driven lumping of differential equations

  • L. Cardelli et al.

    Maximal aggregation of polynomial dynamical systems

    Proc. Natl. Acad. Sci.

    (2017)
  • L. Cardelli et al.

    Erode: a tool for the evaluation and reduction of ordinary differential equations

  • L. Cardelli et al.

    Syntactic Markovian bisimulation for chemical reaction networks

  • L. Cardelli et al.

    Guaranteed error bounds on approximate model abstractions through reachability analysis

  • L. Cardelli et al.

    Symbolic computation of differential equivalences

    Theor. Comput. Sci.

    (2019)
  • L. Cardelli et al.

    Exact maximal reduction of stochastic reaction networks by species lumping

    Bioinformatics

    (2021)
  • R. Caspi et al.

    The metacyc database of metabolic pathways and enzymes and the biocyc collection of pathway/genome databases

    Nucleic Acids Res.

    (2013)
  • A.K. Caydasi et al.

    A dynamical model of the spindle position checkpoint

    Mol. Syst. Biol.

    (2012)
  • V. Chelliah et al.

    Biomodels database: a repository of mathematical models of biological processes

  • U. Consortium

    Uniprot: a hub for protein information

    Nucleic Acids Res.

    (2014)
  • M. Courtot et al.

    Controlled vocabularies and semantics in systems biology

    Mol. Syst. Biol.

    (2011)
  • L. De Moura et al.

    Z3: an efficient SMT solver

  • K. Degtyarenko et al.

    Chebi: a database and ontology for chemical entities of biological interest

    Nucleic Acids Res.

    (2007)
  • Cited by (4)

    View full text