Introducing KIPET: A novel open-source software package for kinetic parameter estimation from experimental datasets including spectra

https://doi.org/10.1016/j.compchemeng.2019.106716Get rights and content

Abstract

This paper presents KIPET (Kinetic Parameter Estimation Toolkit) an open-source toolbox for the determination of kinetic parameters from a variety of experimental datasets including spectra and concentrations. KIPET seeks to overcome limitations of standard parameter estimation packages by applying a unified optimization framework based on maximum likelihood principles and large-scale nonlinear programming strategies for solving estimation problems that involve systems of nonlinear differential algebraic equations (DAEs). The software is based on recent advances proposed by Chen et al. (2016) and puts their original framework into an accessible framework for practitioners and academics. The software package includes tools for data preprocessing, estimability analysis, and determination of parameter confidence levels for a variety of problem types. In addition KIPET introduces informative wavelength selection to improve the lack of fit. All these features have been implemented in Python with the algebraic modeling package Pyomo. KIPET exploits the flexibility of Pyomo to formulate and discretize the dynamic optimization problems that arise in the parameter estimation algorithms. The solution of the optimization problems is obtained with the nonlinear solver IPOPT and confidence intervals are obtained through the use of either sIPOPT or a newly developed tool, k_aug. The capabilities as well as ease of use of KIPET are demonstrated with a number of examples.

Introduction

The determination of accurate reaction kinetics is imperative in many industries to ensure safe, controllable, and scalable processes. In research-based chemical industries, particularly the pharmaceutical industry during the early development phase, little is known about the reaction being studied in advance and it is difficult to determine the chemical species present, reaction mechanisms, and kinetic parameters involved from obtained experimental datasets. Since many of the experiments are costly to run, it is important to develop tools that are able to maximize the information obtained from each experiment. Typical data types that are collected during experiments include spectroscopic data (infrared, near-infrared, ultraviolet-visible, Raman etc.), as well as high-performance liquid chromatography, ultra-performance liquid chromatography and calorimetric data. This section will focus on research around the challenges regarding spectroscopic measurements. However, the software presented in the rest of this paper can be extended to estimate parameters from other measurement types as well.

For estimates from spectroscopic data, self-modeling curve resolution techniques can be used to obtain concentration profiles and pure species’ absorbance profiles through a bilinear decomposition. Lawton and Sylvestre (1971) introduced Beer-Lambert’s law into spectroscopic multivariate curve resolution (MCR) techniques, thereby providing physical meaning to this bilinear decomposition (Lawton, Sylvestre, 1971, Lawton, Sylvestre, 1971). Beer-Lambert’s law states that the profiles of the pure components are related through:D=CST+E, where the data matrix D is of dimension ntp × nwp with measured time points ti,i=1,ntp and measured wavelengths λl,l=1,nwp, C is the molar concentration matrix of dimension ntp  × nc with species ck,k=1,,nc, where nc denotes the number of components, and S is the matrix of dimension nc x nwp. E is then a ntp × nwp matrix containing the residual variations in the data. Their approach was to attempt to obtain the estimates of both C and S simultaneously using what is referred to as a soft-modeling approach. By adding non-negativity constraints on C and S the approach guarantees physical meaning and somewhat reduces rotational ambiguity. This class of approaches also contains other bilinear decomposition methods including Principal Component Analysis (PCA), which provides a more abstract decomposition of the data, maximizing the explained covariance while constraining the orthonormality of the components (Massart, Vandeginste, Buydens, Jong, Lewi, Smeyers-Verbeke, 1997, Jackson, 1991). A limitation of these approaches is that permutation, intensity, and rotational ambiguity result in solutions that cannot be guaranteed to be unique. Additionally, since no model is postulated, it is difficult to determine whether the solution is a feasible representation of the data.

Another approach to MCR to determine kinetic parameters, is based upon the Gauss–Newton–Levenberg–Marquardt approach and often referred to as a hard-modeling approach (Levenberg, 1944, Marquardt, 1963). In this method, a reaction model is proposed, kinetic parameters initialized, and then a system of equations is integrated. This gives initial concentration profiles of the species, which are used to obtain the pure component absorption profiles from the data matrix by least squares regression. The Gauss–Newton–Levenberg–Marquardt approach is then used to update the kinetic parameter values based on the residuals and derivative information with respect to the kinetic parameters (Bijlsma, Boelens, Hoefsloot, Smilde, 2000, Bijlsma, Boelens, Smilde, 2001, Puxty, Maeder, Hungerbühler, 2006). This procedure is repeated until convergence. Due to the large size of the data matrices, problem reduction approaches based on factor analysis have been proposed which have proven to be reliable and have thus been commonly applied. An example of this was proposed by Maeder and Zuberbühler, 1990, where D and S are projected into the subspace spanned by the right-singular vectors of D.

Instead of using a hard- or soft-modeling approach, a hybrid approach was proposed by De Juan et al. (2000), whereby a kinetic model is incorporated into MCR-ALS (MCR-Alternating Least Squares). Here, the least squares calculation of the concentration matrix is performed using soft constraints and the kinetic model is then fitted to this matrix to obtain kinetic parameters. This results in a new concentration matrix that can be used instead of the one modeled using soft constraints which, in turn, can be used to obtain the pure species absorption profiles under soft constraints. This procedure is repeated until a lack-of-fit below a certain threshold is obtained. This methodology has found wide-spread academic use. However, it is not commonly used in industrial applications due to the slow convergence and challenges pertaining to large datasets (Chen et al., 2018).

Indirect spectral hard modeling was introduced by Alsmeyer et al. (2004) and makes use of parameters to build models for the pure component spectra, allowing for greater degrees of flexibility during data fitting. The method is effective for situations where calibration methods fail, but it requires that all pure component spectra are known in advance and therefore cannot detect intermediates or accommodate non-absorbing species (Alsmeyer et al., 2004). Neymeyr et al. (2010) proposed an alternative method, called pure component decomposition (PCD), where singular value decomposition (SVD) is first used in order to obtain a low-rank approximation of D and then a rotation matrix is introduced. An objective function which minimizes the error between the approximation and original D matrix is chosen for minimization. Instead of adding constraints for non-negativity, penalties are introduced into the objective function with weight factors. Their implementation uses numerous optimization techniques to solve this regularized system for the rotation matrix, which provides the factorization for the D matrix. Furthermore, (Sawall et al., 2012) proposed an extension to PCD through combination with a reaction model, whereby the error between the model concentrations and the concentration matrix are used to regularize the objective function (Sawall et al., 2012). This approach, which they called a model-free approach, can obtain kinetic parameters and the rotation matrix simultaneously; however there are many associated weighting factors that need to be determined for the regularization terms, which need to be estimated without proposed guidelines.

All of the methods mentioned above suffer from slow convergence and poor performance when instability or ill-conditioning are present in the dynamic system. These methods will fail whenever there are dependent columns in the concentration matrix, which is not an uncommon occurrence, because the CTC matrix is required to be non-singular. Additionally, these methods do not consider system noise in the reaction model, nor do they provide convergence guarantees. More details of these sequential approaches are presented by Ruckebusch and Blanchet (2013); Golshan et al. (2016). In order to overcome these disadvantages, an entirely new approach was presented by Chen et al. (2016).

The proposed alternative method to these hard- and soft-modeled approaches is to simultaneously obtain the reaction kinetic parameters with the curve resolution. This framework, which will be discussed in more detail in Section 2, makes use of maximum likelihood principles, collocation methods, and large-scale nonlinear programming (NLP). The two steps involved in this framework include a noise deconvolution step, to estimate the measurement noise separately from the model variable noise, followed by a simultaneous parameter estimation that also provides confidence intervals based on the optimal solution sensitivities. This method has been tested extensively, and has been proven to be successful in obtaining robust solutions for a number of different scenarios, including examples with both real and simulated data. Furthermore its capabilities can be extended to address more complex problems (Chen, Biegler, García-Muñoz, 2018, Chen, Biegler, García-Muñoz, 2019).

While the number of methodologies shown here are effective under certain assumptions and with certain user skill levels, there are no standard toolkits or methods applied across the chemical and pharmaceutical industries for obtaining kinetic parameters from spectra. A major disadvantage of current approaches is that practitioners are forced to use estimates or only small subsets of data to design further experiments and make predictions about the reaction systems. A typical example of one of the most common methods would be to identify known peaks of certain species, track these peaks to estimate the concentration of those species, and then use these species concentration profiles with least squares estimation to determine the reaction kinetics. While these approaches can be informative in the absence of other methods, they fail to make use of all of the valuable information that can be collected during experiments, and may not provide much information regarding the fundamental reaction mechanisms.

The freely available MCR-ALS GUI2.0 (Jaumot et al., 2015), is available for MATLAB and as a Python package called ”pyMCR”. This software is an updated implementation of the previously discussed MCR-ALS technique, but therefore also has the same shortcomings mentioned previously.

In the open-source domain, toolkits such as ACADOs (Verschueren et al., 2018) and CasADi (Andersson et al., 2018) provide many useful features for general parameter estimation problems and sensitivity analysis of solutions, but do not provide kinetic parameter estimation from spectra explicitly. Similarly, GPdoemd (Olofsson et al., 2019), and the package developed in MC++ by Paulen et al. (2013), provide good general parameter estimation techniques, but do not explicitly couple kinetic parameter estimation with the full spectra problem. The fairly recent HyperSpy can be used to develop hard- and soft-modeling approaches for the estimation of kinetic parameters from spectra. However these require detailed knowledge of the problem and Python programming skills in order to successfully write custom estimation code (de la Peña et al., 2018). HyperSpy is mostly meant for data visualization and manipulation purposes.

On the other hand, commercial software packages such as gPROMS are used by many practitioners for their appealing user interface, ready-made and easily implemented solution strategies, and customer service. However these packages do not provide methods to directly incorporate spectra for simultaneous parameter estimation; rather using implicit measurements as mentioned above. gPROMS allows users to test candidate models sequentially, based on these direct responses and then allows users to validate their model by providing statistical tools (Process Systems Enterprise, 1997-2018).

Similarly, the GREG and GREGPLUS (General regression) tools from Athena Visual Studio can be used for nonlinear parameter estimation and process model analysis from single- or multi-response data (Stewart, Caracotsios, Sørensen, 1992, Stewart, Caracotsios, 2008). The software is mainly used for model discrimination and includes a number of different objective function formulations for users to choose from as well as tools for estimability based on the rival models and datasets, but cannot deal directly with spectra.

With a lack of software for the treatment of kinetic parameter estimation problems from spectra, combined with the proliferation of multivariate data collection techniques for batch reaction systems and the relatively new and highly adaptable formulation from Chen et al. (2016), there exists an opportunity to create a new, multi-use tool that can be easily extended and manipulated to perform a wide-array of parameter estimation strategies. Many industrial practitioners combine the commercial tools above with a large number of custom codes in different modeling environments to solve their problems. We present a new open-source framework that industrial and research practitioners can utilize to perform a wide-range of tasks related to the MCR problem.

The remainder of this paper is set out as follows: the next section will detail the methodology behind KIPET, an open-source Python package for parameter estimation from spectroscopic and other data sources, and will give an overview of the software package itself with a specific focus on the various classes and methods. A number of useful functionalities are discussed. Section 3 contains case studies that demonstrate the proposed approach, with sample code and results directly from the software. Finally, we present our conclusions and planned future work on the package.

Section snippets

Methodology

The KIPET framework is based on the model formulations and concepts developed in Chen et al. (2016). Their mathematical formulations have already been shown to be effective for problems involving large numbers of components and reactions, as well as effective for cases involving large real and simulated datasets. In this section we review the most important features of their work and the algorithms and methods implemented in the toolbox. It should be noted, before beginning this section, that

Results

In this section we will demonstrate the utility of the software package described above through the use of illustrative examples with sample code. All of the examples from the paper are included in the Github repository. Note that, regarding the significant figures reported in the example solutions, it should be assessed by the user as to whether the magnitude and number of significant figures are physically meaningful, i.e. whether experimental uncertainty, determined through the confidence

Conclusion

The analysis of data resulting from spectroscopic measurements on batch reactors is an extremely challenging problem, especially when it is desired to directly obtain kinetic parameters and their confidence regions from large noisy datasets. It is crucial to maximize the information obtained from each experiment as each experiment can be costly to run. Since few software packages exist for the determination of kinetic parameters directly from spectra, and with the development of an easily

Declaration of Competing Interest

The authors declare no conflict of interest.

Acknowledgements

The authors would like to gratefully acknowledge the funding of Eli Lilly and Company, along with Pfizer Inc. for postdoctoral fellowship (to C.S.).

References (57)

  • C. Ruckebusch et al.

    Multivariate curve resolution: a review of advanced and tailored applications and challenges

    Analytica Chimica Acta

    (2013)
  • M. Short et al.

    Kipet - an open-source kinetic parameter estimation toolkit

    Computer Aided Chemical Engineering, Proceedings of the 9th International Conference on Foundations of Computer-Aided Process Design

    (2019)
  • V.M. Zavala et al.

    Interior-point decomposition approaches for parallel solution of large-scale nonlinear parameter estimation problems

    Chem. Eng. Sci.

    (2008)
  • F. Alsmeyer et al.

    Indirect spectral hard modeling for the analysis of reactive and interacting mixtures

    Appl. Spectrosc.

    (2004)
  • J.A.E. Andersson et al.

    Casadi - A software framework for nonlinear optimization and optimal control

    Math. Program. Comput.

    (2018)
  • R. Barnes et al.

    Standard normal variate transformation and de-Trending of near-Infrared diffuse reflectance spectra

    Appl. Spectrosc.

    (1989)
  • L. Biegler

    Nonlinear Programming

    (2010)
  • S. Bijlsma et al.

    Determination of rate constants in second-order kinetics using UV-visible spectroscopy

    Appl. Spectrosc.

    (2001)
  • J.M. Brenchley et al.

    Wavelength selection characterization for NIR spectra

    Appl. Spectrosc.

    (1997)
  • L. Cécillon et al.

    Variable selection in near infrared spectra for the biological characterization of soil and earthworm casts

    Soil Biol. Biochem.

    (2008)
  • W. Chen et al.

    An approach for simultaneous estimation of reaction kinetics and curve resolution from process and spectral data

    J. Chemom.

    (2016)
  • W. Chen et al.

    Kinetic parameter estimation based on spectroscopic data with unknown absorbing species

    AIChE J.

    (2018)
  • W. Chen et al.

    A unified framework for kinetic parameter estimation based on spectroscopic data with or without unwanted contributions

    Ind. Eng. Chem. Res.

    (2019)
  • N. Chiang et al.

    Structured nonconvex optimization of large-scale energy systems using PIPS-NLP

    Proc. of the 18th Power Systems Computation Conference (PSCC), Wroclaw, Poland

    (2014)
  • A. De Juan et al.

    Multivariate Curve Resolution (MCR). solving the mixture analysis problem

    Analytical Methods

    (2014)
  • U. Gautam R. and Vanga S. and Ariese F.

    Review of multidimensional data processing approaches for Raman and infrared spectroscopy. 2(1):1-38.

    Epj Tech. Instrum.

    (2015)
  • W.E. Hart et al.

    Pyomo Optimization Modeling in Python

    (2017)
  • W.E. Hart

    Pyomo–Optimization Modeling in Python

    (2017)
  • Cited by (7)

    • Characterization of reactions and growth in automated continuous flow and bioreactor platforms—From linear DoE to model-based approaches

      2022, Simulation and Optimization in Process Engineering: The Benefit of Mathematical Methods in Applications of the Chemical Industry
    • Estimating variances and kinetic parameters from spectra across multiple datasets using KIPET

      2020, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      An example of its use, with sample code, is included in Section 5. For more information on the structure of KIPET, the reader is referred to the original article [16]. The full procedure is shown graphically in Fig. 3.

    View all citing articles on Scopus
    View full text