Approximate maximum likelihood estimation for stochastic chemical kinetics
- Aleksandr Andreychenko^{1},
- Linar Mikeev^{1},
- David Spieler^{1} and
- Verena Wolf^{1}Email author
https://doi.org/10.1186/1687-4153-2012-9
© Andreychenko et al.; licensee Springer. 2012
Received: 8 January 2012
Accepted: 7 July 2012
Published: 18 July 2012
Abstract
Recent experimental imaging techniques are able to tag and count molecular populations in a living cell. From these data mathematical models are inferred and calibrated. If small populations are present, discrete-state stochastic models are widely-used to describe the discreteness and randomness of molecular interactions. Based on time-series data of the molecular populations, the corresponding stochastic reaction rate constants can be estimated. This procedure is computationally very challenging, since the underlying stochastic process has to be solved for different parameters in order to obtain optimal estimates. Here, we focus on the maximum likelihood method and estimate rate constants, initial populations and parameters representing measurement errors.
Keywords
Introduction
During the last decade stochastic models of networks of chemical reactions have become very popular. The reason is that the assumption that chemical concentrations change deterministically and continuously in time is not always appropriate for cellular processes. In particular, if certain substances in the cell are present in small concentrations the resulting stochastic effects cannot be adequately described by deterministic models. In that case, discrete-state stochastic models are advantageous because they take into account the discrete random nature of chemical reactions. The theory of stochastic chemical kinetics provides a rigorously justified framework for the description of chemical reactions where the effects of molecular noise are taken into account[1]. It is based on discrete-state Markov processes that explicitly represent the reactions as state-transitions between population vectors. When the molecule numbers are large, the solution of the deterministic description of a reaction network and the mean of the corresponding stochastic model agree up to a small approximation error. If, however, species with small populations are involved, then only a stochastic description can provide probabilities of events of interest such as probabilities of switching between different expression states in gene regulatory networks or the distribution of gene expression products. Moreover, even the mean behavior of the stochastic model can largely deviate from the behavior of the deterministic model[2]. In such cases the parameters of the stochastic model rather then the parameters of the deterministic model have to be estimated[3–5].
Here, we consider noisy time series measurements of the system state as they are available from wet-lab experiments. Recent experimental imaging techniques such as high-resolution fluorescence microscopy can measure small molecule counts with measurement errors of less than one molecule[6]. We assume that the structure of the underlying reaction network is known but the stochastic reaction rate constants of the network are unknown parameters. Then we identify rate constants that maximize the likelihood of the time series data. Maximum likelihood estimators are the most popular estimators since they have desirable mathematical properties. Specifically, they become minimum variance unbiased estimators and are asymptotically normal as the sample size increases.
Our main contribution consists in devising an efficient algorithm for the numerical approximation of the likelihood and its derivatives w.r.t. the stochastic reaction rate constants. Furthermore, we show how similar techniques can be used to estimate the initial molecule numbers of a network as well as parameters related to the measurement error. We also present extensive experimental results that give insights about the identifiability of certain parameters. In particular, we consider a simple gene expression model and the identifiability of reaction rate constants w.r.t. varying observation interval lengths and varying numbers of time series. Moreover, for this system we investigate the identifiability of reaction rate constants if the state of the gene cannot be observed but only the number of mRNA molecules. For a more complex gene regulatory network, we present parameter estimation results where different combinations of proteins are observed. In this way we reason about the sensitivity of the estimation of certain parameters w.r.t. the protein types that are observed.
Previous parameter estimation techniques for stochastic models are based on Monte-Carlo sampling[3, 5] because the discrete state space of the underlying model is typically infinite in several dimensions and a priori a reasonable truncation of the state space is not available. Other approaches are based on Bayesian inference which can be applied both to deterministic and stochastic models[7–9]. In particular, approximate Bayesian inference can serve as a way to distinguish among a set of competing models[10]. Moreover, in the context of Bayesian inference linear noise approximations have been used to overcome the problem of large discrete state spaces[11].
Our method is not based on sampling but directly calculates the likelihood using a dynamic truncation of the state space. More precisely, we first show that the computation of the likelihood is equivalent to the evaluation of a product of vectors and matrices. This product includes the transition probability matrix of the associated continuous-time Markov process, i.e., the solution of the Kolmogorov differential equations (KDEs), which can be seen as a matrix-version of the chemical master equation (CME). Solving the KDEs is infeasible because of the state space of the underlying Markov model is very large or even infinite. Therefore we propose an iterative approximation algorithm during which the state space is truncated in an on-the-fly fashion, that is, during a certain time interval we consider only those states that significantly contribute to the likelihood. This technique is based on ideas presented in[12], but here we additionally explain how the initial molecule numbers can be estimated and how an approximation of the standard deviation of the estimated parameters can be derived. Moreover, we provide more complex case studies and run extensive numerical experiments to assess the identifiability of certain parameters. In these experiments we assume that not all molecular populations can be observed and estimate parameters for different observation scenarios, i.e., we assume different numbers of observed cells and different observation interval lengths. We remark that this article is an extension of a previously published extended abstract[13].
The article is further organized as follows: After introducing the stochastic model in Section“Discrete-state stochastic model”, we discuss the maximum likelihood method in Section “Parameter inference” and present our approximation method in Section “Numerical approximation algorithm”. Finally, we report on experimental results for two reaction networks in Section “Numerical results”.
Discrete-state stochastic model
According to Gillespie’s theory of stochastic chemical kinetics, a well-stirred mixture of n molecular species in a volume with fixed size and fixed temperature can be represented as a continuous-time Markov chain {X(t),t ≥ 0}[1]. The random vector X(t)=(X_{1}(t),…,X_{ n }(t)) describes the chemical populations at time t, i.e., X_{ i }(t) is the number of molecules of type i ∈ {1,…,n} at time t. Thus, the state space of X is${\mathbb{Z}}_{+}^{n}={\{0,1,\dots \}}^{n}$. The state changes of X are triggered by the occurrences of chemical reactions, which are of m different types. For j ∈ {1,…,m} let${\mathbf{v}}_{j}\in {\mathbb{Z}}^{n}$ be the nonzero change vector of the j-th reaction type. Thus, if X(t)=x and the j-th reaction is possible in x, then X(t + dt)=x + v_{ j } is the state of the system after the occurrence of the j-th reaction within the infinitesimal time interval t t + dt).
Each reaction type has an associated propensity function, denoted by α_{1},…,α_{ m }, which is such that α_{ j }(x)·dt is the probability that, given X(t)=x, one instance of the j-th reaction occurs within [t,t + dt). The value α_{ j }(x) is proportional to the number of distinct reactant combinations in state x and to the reaction rate constant c_{ j }. The probability that a randomly selected pair of reactants collides and undergoes the j-th chemical reaction within [t,t + dt) is then given by c_{ j }dt. The value c_{ j }depends on the volume and the temperature of the system as well as on the microphysical properties of the reactant species.
Example 1
We consider the simple gene expression model described in[4] that involves three chemical species, namely DNA_{ON}, DNA_{OFF}, and mRNA, which are represented by the random variables X_{1}(t), X_{2}(t), and X_{3}(t), respectively. The three possible reactions are DNA_{ON}→DNA_{OFF}, DNA_{OFF}→DNA_{ON}, and DNA_{ON}→DNA_{ON}+ mRNA. Thus, v_{1}=(−1,1,0), v_{2}=(1,−1,0), v_{3}=(0,0,1). For a state x=(x_{1}x_{2}x_{3}), the propensity functions are α_{1}(x)=c_{1}·x_{1}, α_{2}(x)=c_{2}·x_{2}, and α_{3}(x)=c_{3}·x_{1}. Note that given the initial state x=(1,0,0), at any time, either the DNA is active or not, i.e. x_{1}=0 and x_{2}=1, or x_{1}=1 and x_{2}=0. Moreover, the state space of the model is infinite in the third dimension. For a fixed time instant t > 0, no upper bound on the number of mRNA is known a priori. All states x with${x}_{3}\in {\mathbb{Z}}_{+}$ have positive probability if t > 0 but these probabilities will tend to zero as x_{3}→∞.
The CME
For a state$\mathbf{x}\in {\mathbb{Z}}_{+}^{n}$ and t ≥ 0, let p(x,t) denote the probability Pr(X(t)=x), i.e., the probability that the process is in state x at time t. Furthermore, let p(t) be the row vector with entries p(x,t) where we assume a fixed enumeration of all possible states.
where Q is the infinitesimal generator matrix of X with Q(x y)=α_{ j }(x) if y=x + v_{ j } and reaction type j is possible in state x. Note that, in order to simplify our presentation, we assume here that all vectors v_{ j } are distinct. All remaining entries of Q are zero except for the diagonal entries which are equal to the negative row sum. The ordinary first-order differential equation in (1) is a direct consequence of the Kolmogorov forward equation but standard numerical solution techniques for systems of first-order linear equations cannot be applied to solve (1) because the number of nonzero entries in Q typically exceeds the available memory capacity for systems of realistic size. If the expected populations of all species remain small (at most a few hundreds) then the CME can be efficiently approximated using projection methods[14–16] or fast uniformization methods[17, 18]. The idea of these methods is to avoid an exhaustive state space exploration and, depending on a certain time interval, restrict the analysis of the system to a subset of states.
with initial condition${\mathbf{s}}_{\lambda}\left(0\right)=\frac{\partial}{\mathrm{\partial \lambda}}{\mathbf{p}}_{\lambda}\left(0\right)$ since Q is independent of x_{ i }(0). Similar ODEs can be derived for higher order derivatives of the CME.
Parameter inference
Following the notation in[4], we assume that observations of the reaction network are made at time instances${t}_{1},\dots ,{t}_{R}\in {\mathbb{R}}_{\ge 0}$ where t_{1} < ⋯ < t_{ R }. Since it is unrealistic to assume that all species can be observed, we assume w.l.o.g. that the species are ordered such that we have observations of X_{1},…,X_{ d } for some fixed d with 1 ≤ d ≤ n, i.e. O_{ i }(t_{ ℓ }) is the observed number of species i at time t_{ ℓ }for i ∈ {1,…,d} and ℓ ∈ {1,…,R}. Let O(t_{ ℓ })=(O_{1}(t_{ ℓ }),…,O_{ d }(t_{ ℓ }))be the corresponding vector of observations. Since these observations are typically subject to measurement errors, we assume that O_{ i }(t_{ ℓ })=X_{ i }(t_{ ℓ }) + ε_{ i }(t_{ ℓ }) where the error terms ε_{ i }(t_{ ℓ }) are independent and identically normally distributed with mean zero and standard deviation σ. Note that X_{ i }(t_{ ℓ }) is the true population of the i-th species at time t_{ ℓ }. Clearly, this implies that, conditional on X_{ i }(t_{ ℓ }), the random variable O_{ i }(t_{ ℓ }) is independent of all other observations as well as independent of the history of X before time t_{ ℓ }.
We assume further that we do not know the values of the rate constants c=(c_{1},…,c_{ m }) and our aim is to estimate these constants. Similarly, the initial populations x(0) and the exact standard deviation σ of the error terms are unknown and must be estimated. We remark that it is straightforward to extend the estimation framework such that a covariance matrix for a multivariate normal distribution of the error terms is estimated. In this way, different measurement errors of the species can be taken into account as well as dependencies between error terms.
where the maximum is taken over all σ > 0 and vectors x(0), c with all components strictly positive. This optimization problem is known as the maximum likelihood problem[19]. Note that x(0)^{∗}, c^{∗} and σ^{∗}are random variables because they depend on the (random) observations O(t_{1}),…,O(t_{ R }).
Here, e is the vector with all entries equal to one and W_{ ℓ } is a diagonal matrix whose diagonal entries are all equal to w(x_{ ℓ }) with ℓ ∈ {1,…,R}, where W_{ ℓ } is of the same size as P_{ ℓ }.
i.e. the populations of the unobserved species have no influence on the weight.
The derivative of$\phantom{\rule{0.3em}{0ex}}\mathcal{\mathcal{L}}$ w.r.t. x_{ i }(0) and σ is derived analogously. The only difference is that p(t_{0}) is dependent on x_{ i }(0) and P_{1},…,P_{ R } are independent of σ but W_{1},…,W_{ R } depend on σ. It is also important to note that expressions for partial derivatives of second order can be derived in a similar way. These derivatives can then be used for an efficient gradient-based local optimization.
where λ is c_{ j }, x_{ i }(0) or σ. It is also important to note that only the weights w(x_{ ℓ }) depend on k, that is, on the observed sequence O^{ k }(t_{1}),…,O^{ k }(t_{ R }). Thus, when we compute${\mathcal{\mathcal{L}}}_{k}$ based on (9) we use for all k the same transition matrices P_{1},…,P_{ R }and the same initial conditions p(t_{0}), but possibly different matrices W_{1},…,W_{ R }.
Numerical approximation algorithm
with initial condition$\stackrel{~}{\mathbf{u}}\left({t}_{\ell -1}\right)=\mathbf{u}\left({t}_{\ell -1}\right)$ for the time interval t_{ℓ−1}t_{ ℓ }) where ℓ ∈ {1,…,R}. After solving the ℓ-th system of ODEs we set$\mathbf{u}\left({t}_{\ell}\right)=\stackrel{~}{\mathbf{u}}\left({t}_{\ell}\right){W}_{\ell}$ and finally compute$\mathcal{\mathcal{L}}=\mathbf{u}\left({t}_{R}\right)\mathbf{e}$. We remark that this is the same as solving the CME for different initial conditions and due to the largeness problem of the state space we use the dynamic truncation of the state space that we proposed in previous work[17]. The idea is to consider only the most relevant equations of the system (13), i.e., the equations that correspond to those states x where the relative contribution$\u0169(\mathbf{x},t)/\left(\stackrel{~}{\mathbf{u}}\right({t}_{\ell}\left)\mathbf{e}\right)$ is greater than a threshold δ. Since during the integration the contribution of a state might increase or decrease we add/remove equations on-the-fly depending on the current contribution of the corresponding state. Note that the structure of the CME allows us to determine in a simple way which states will become relevant in the next integration step. For a small time step of length h we know that the probability being moved from state x−v_{ j } to x is approximately α_{ j }(x−v_{ j })h. Thus, we can simply check whether a state that receives a certain probability inflow receives more than the threshold. In this case we consider the corresponding equation in (13). Otherwise, if a state does not receive enough probability inflow, we do not consider it in (13). For more details on this technique we refer to[17].
with initial condition${\stackrel{~}{\mathbf{u}}}_{j}\left({t}_{\ell -1}\right)={\mathbf{u}}_{j}\left({t}_{\ell -1}\right)$ for the time interval [t_{ℓ−1},t_{ ℓ }). As above, we set${\mathbf{u}}_{j}\left({t}_{\ell}\right)={\stackrel{~}{\mathbf{u}}}_{j}\left({t}_{\ell}\right){W}_{\ell}$ and obtain$\frac{\partial}{\partial {c}_{j}}\mathcal{\mathcal{L}}$ as u_{ j }(t_{ R })e.
Solving (13) and (14) simultaneously is equivalent to the computation of the partial derivatives in (2) with different initial conditions. Numerical experiments show that the approximation errors of the likelihood and its derivatives are of the same order of magnitude as those of the transient probabilities and their derivatives. For instance, for a finite-state enzymatic reaction system that is small enough to be solved without truncation we found that the maximum absolute error in the approximations of the vectors p(t) and s_{ λ }(t) is 10^{−8} if the truncation threshold is δ=10^{−15}(details not shown).
In the case of K observation sequences we repeat the above algorithm in order to sequentially compute${\mathcal{\mathcal{L}}}_{k}$ for k ∈ {1,…,K}. We exploit (11) and (12) to compute the total log-likelihood and its derivatives as a sum of individual terms. In a similar way, second derivatives can be approximated. Obviously, it is possible to parallelize the algorithm by computing${\mathcal{\mathcal{L}}}_{k}$ in parallel for all k.
In order to find values for which the likelihood becomes maximal, global optimization techniques can be applied. Those techniques usually use a heuristic for different initial values of the parameters and then follow the gradient to find local optima of the likelihood. In this step the algorithm proposed above is used since it approximates the gradient of the likelihood. The approximated global optimum is then chosen as the minimum/maximum of the local optima, i.e, we determine those values of the parameters that give the largest likelihood. Clearly, this is an approximation and we cannot guarantee that the global optimum was found. Note that this would also be the case if we could compute the exact likelihood. If, however, a good heuristic for the starting points is chosen and the number of starting points is large, then it is likely that the approximation is accurate. Moreover, since we have approximated the second derivative of the log-likelihood, we can compute the entries of the Fisher information matrix and use this to approximate the standard deviation of the estimated parameters, i.e., we consider the square root of the diagonal entries of the inverse of a matrix H which is the Hessian matrix of the negative log-likelihood. Assuming that the second derivative of the log-likelihood is computed exactly, these entries asymptotically tend to the standard deviations of the estimated parameters.
We remark that the approximation proposed above becomes unfeasible if the reaction network contains species with high molecule numbers since in this case the number of states that have to be considered is very large. A numerical approximation of the likelihood is, as the solution of the CME, only possible if the expected populations of all species remain small (at most a few hundreds) and if the dimension of the process is not too large. Moreover, if many parameters have to be estimated, the search space of the optimization problem may become unfeasibly large. It is however straightforward to parallelize local optimizations starting from different initial point.
Numerical results
In this section we present numerical results of our parameter estimation algorithm applied to two models, the simple gene expression in Example 1 and a multi-attractor model. The corresponding SBML files are provided as Additional files1 and2. For both models, we generated time series data using Monte-Carlo simulation where we added white noise to represent measurement errors, i.e. we added random terms to the populations that follow a normal distribution with mean zero and a standard deviation of σ. Our algorithm for the approximation of the likelihood is implemented in C++ and linked to MATLAB’s optimization toolbox[20] which we use to minimize the negative log-likelihood. The global optimization method (Matlab’s GlobalSearch[21]) uses a scatter-search algorithm to generate a set of trial points (potential starting points) and heuristically decides when to perform a local optimization. We ran our experiments on an Intel Core i7 at 2.8 GHz with 8 GB main memory.
Simple gene expression
We ran experiments for varying values of K and R (K,R ∈ {1,2,5,10,20,50,100}) to get insights whether for this network it is more advantageous to have many observation sequences with long observation intervals or few observation sequences with a short time between two successive observations. In addition, we ran the same experiments with the restriction that only the number of mRNA molecules was observable but not the state of the gene. In both cases we approximated the standard deviations of our estimators as a measure of quality by repeating our estimation procedure 100 times and by the Fisher information matrix as explained at the end of the previous section. We used 100 trial points for the global optimization procedure and chose tighter constraints than above for the rate constants ([0.01,1] for c_{1} and [0.1,10] for c_{2},c_{3}) to have a convenient total running time.
At first, we remark that neither the quality of the estimation nor the running time of our algorithm is significantly dependent on whether we observe the state of the gene in addition to the mRNA level or not. Moreover, concerning the estimation of all of the parameters, one can witness that the estimates converge more quickly against the real values along the K axis than the R axis and also the standard deviations decrease faster. Consequently, at least for the gene expression model, it is more advantageous to increase the number of observation sequences, than the number of measurements per sequence. For example, K=100 sequences with only one observation each already provide enough information to estimate c_{1} up to a relative error of around 2.1%. Unfortunately, in this case the computation time is the highest since we have to compute K individual likelihoods (one for each observation sequence). Moreover, if R is small then the truncation of the state space is less efficient. The reason is that we have to integrate for a long time until we multiply with the weight matrix W_{ ℓ }. After this multiplication we decide which states contribute significantly to the likelihood and which states are neglected. We can, however, trade off accuracy against running time by varying K.
For the measurement noise parameter σ we see that it is more advantageous to increase R. Even five observation sequences with a high number of observations per sequence (R=100) suffice to estimate the noise up to a relative error of around 10.2%. For the estimation of the initial conditions, both K and R seem to play an equally important role.
Different approximations of the standard deviations of the estimators
Method | K | R | c _{1} | c _{2} | c _{3} | σ | mRNA(0) |
---|---|---|---|---|---|---|---|
Fisher inf. matrix | 10 | 10 | 0.0545104 | 0.561963 | 0.935324 | 0.364339 | 0.639471 |
100 experiments | 0.0358142 | 0.198700 | 0.262223 | 0.392884 | 0.490305 | ||
Fisher inf. matrix | 20 | 20 | 0.0324508 | 0.299487 | 0.451476 | 0.174095 | 0.594820 |
100 experiments | 0.0304157 | 0.167431 | 0.287471 | 0.134506 | 0.436059 | ||
Fisher inf. matrix | 50 | 50 | 0.0139185 | 0.110709 | 0.152229 | 0.0440282 | 0.238033 |
100 experiments | 0.0140331 | 0.078516 | 0.146232 | 0.0353837 | 0.183888 | ||
Fisher inf. matrix | 100 | 100 | 0.00866066 | 0.0548249 | 0.0728129 | 0.0182564 | 0.208469 |
100 experiments | 0.00691956 | 0.0430123 | 0.0641821 | 0.0217544 | 0.187968 |
Multi-attractor model
Chemical reactions of the multi-attractor model
PaxDna | $\stackrel{p}{\to}$ | PaxDna + PaxProt |
---|---|---|
PaxProt | $\stackrel{d}{\to}$ | ∅ |
PaxDna + DeltaProt | $\stackrel{b}{\to}$ | PaxDnaDeltaProt |
PaxDnaDeltaProt | $\stackrel{u}{\to}$ | PaxDna + DeltaProt |
MafADna | $\stackrel{p}{\to}$ | MafADna + MafAProt |
MafAProt | $\stackrel{d}{\to}$ | ∅ |
MafADna + PaxProt | $\stackrel{b}{\to}$ | MafADnaPaxProt |
MafADnaPaxProt | $\stackrel{u}{\to}$ | MafADna + PaxProt |
MafADnaPaxProt | $\stackrel{p}{\to}$ | MafADnaPaxProt + MafAProt |
MafADna + MafAProt | $\stackrel{b}{\to}$ | MafADnaMafAProt |
MafADnaMafAProt | $\stackrel{u}{\to}$ | MafADna + MafAProt |
MafADnaMafAProt | $\stackrel{p}{\to}$ | MafADnaMafAProt + MafAProt |
MafADna + DeltaProt | $\stackrel{b}{\to}$ | MafADnaDeltaProt |
MafADnaDeltaProt | $\stackrel{u}{\to}$ | MafADna + DeltaProt |
DeltaDna | $\stackrel{p}{\to}$ | DeltaDna + DeltaProt |
DeltaProt | $\stackrel{d}{\to}$ | ∅ |
DeltaDna + PaxProt | $\stackrel{b}{\to}$ | DeltaDnaPaxProt |
DeltaDnaPaxProt | $\stackrel{u}{\to}$ | DeltaDna + PaxProt |
DeltaDnaPaxProt | $\stackrel{p}{\to}$ | DeltaDnaPaxProt + DeltaProt |
DeltaDna + MafAProt | $\stackrel{b}{\to}$ | DeltaDnaMafAProt |
DeltaDnaMafAProt | $\stackrel{u}{\to}$ | DeltaDna + MafAProt |
DeltaDna + DeltaProt | $\stackrel{b}{\to}$ | DeltaDnaDeltaProt |
DeltaDnaDeltaProt | $\stackrel{u}{\to}$ | DeltaDna + DeltaProt |
DeltaDnaDeltaProt | $\stackrel{p}{\to}$ | DeltaDnaDeltaProt + DeltaProt |
We observe in Figure6 that as expected the accuracy of the estimation and the running time of our algorithm is best when we have full observability of the system and gets worse with an increasing number of unobservable species. Still the estimation quality is very high when five observation sequences are provided for almost all combinations and parameters. When only one observation sequence is given (K=1), the parameter estimation becomes unreliable and time consuming. This comes from the fact that the quality of the approximation highly depends on the generated observation sequence. It is possible to get much better and faster approximations with a single observation sequence. However, we did not optimize our results but generated one random observation sequence and ran our estimation procedure once based on this.
Production rate estimation in the multi-attractor model
Protein | K | Estimated rate constant | Standard deviation | Time (hours) | Observed proteins |
---|---|---|---|---|---|
PaxProt | 1 | 10.0 | 13.6159 | 7.45 | MafAProt, DeltaProt |
5 | 0.5693 | 2.1842 | 6.34 | ||
MafAProt | 1 | 4.9998 | 4.9884 | 11.62 | PaxProt, DeltaProt |
5 | 5.4853 | 2.3873 | 13.86 | ||
DeltaProt | 1 | 2.5453 | 1.8075 | 4.35 | PaxProt, MafAProt |
5 | 5.3646 | 1.4682 | 12.39 |
Finally, we remark that for the multi-attractor model it seems difficult to predict whether for a given parameter the observation of a certain set of proteins yields a good accuracy or not. It can, however, be hypothesized that, if we want to accurately estimate the rate constant of a certain chemical reaction, then we should observe as many of the involved species as possible. Moreover, it is reasonable that constants of reactions that occur less often are more difficult to estimate (such as the production of PaxProt). In such a case more observation sequences are necessary to provide reliable information about the speed of the reaction.
Conclusion
Parameter inference for stochastic models of cellular processes demands huge computational resources. We proposed an efficient numerical method to approximate maximum likelihood estimators for a given set of observations. We consider the case where the observations are subject to measurement errors and where only the molecule numbers of some of the chemical species are observed at certain points in time. In our experiments we show that if the observations provide sufficient information then parameters can be accurately identified. If only little information is available then the approximations of the standard deviations of the estimators indicate whether more observations are necessary to accurately calibrate certain parameters.
As future work we plan a comparison of our technique to parameter estimation based on Bayesian inference. In addition, we will examine whether a combination of methods based on prior knowledge and the maximum likelihood method is useful. Future plans further include parameter estimation methods for systems where some chemical species have small molecule numbers while others are high rendering a purely discrete representation infeasible. In such cases, hybrid models are advantageous where large populations are represented by continuous deterministic variables while small populations are still described by discrete random variables[23].
Declarations
Acknowledgements
This research was been partially funded by the German Research Council (DFG) as part of the Cluster of Excellence on Multimodal Computing and Interaction at Saarland University and the Transregional Collaborative Research Center “Automatic Verification and Analysis of Complex Systems” (SFB/TR 14 AVACS).
Authors’ Affiliations
References
- Gillespie DT: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem 1977, 81(25):2340-2361. 10.1021/j100540a008View ArticleGoogle Scholar
- Loinger A, Lipshtat A, Balaban NQ, Biham O: Stochastic simulations of genetic switch systems. Phys. Rev. E 2007, 75: 021904.View ArticleGoogle Scholar
- Tian T, Xu S, Gao J, Burrage K: Simulated maximum likelihood method for estimating kinetic rates in gene expression. Bioinformatics 2007, 23: 84-91. 10.1093/bioinformatics/btl552View ArticleGoogle Scholar
- Reinker S, Altman R, Timmer J: Parameter estimation in stochastic biochemical reactions. IEEE Proc. Syst. Biol 2006, 153: 168-178.View ArticleGoogle Scholar
- Uz B, Arslan E, Laurenzi I: Maximum likelihood estimation of the kinetics of receptor-mediated adhesion. J. Theor. Biol 2010, 262(3):478-487. 10.1016/j.jtbi.2009.10.015View ArticleGoogle Scholar
- Golding I, Paulsson J, Zawilski S, Cox E: Real-time kinetics of gene activity in individual bacteria. Cell 2005, 123(6):1025-1036. 10.1016/j.cell.2005.09.031View ArticleGoogle Scholar
- Boys R, Wilkinson D, Kirkwood T: Bayesian inference for a discretely observed stochastic kinetic model. Stat. Comput 2008, 18: 125-135. 10.1007/s11222-007-9043-xMathSciNetView ArticleGoogle Scholar
- Higgins JJ: Bayesian inference and the optimality of maximum likelihood estimation. Int. Stat. Rev 1977, 45: 9-11. 10.2307/1402999MATHMathSciNetView ArticleGoogle Scholar
- Gillespie CS, Golightly A: Bayesian inference for generalized stochastic population growth models with application to aphids. J. R. Stat. Soc. Ser. C 2010, 59(2):341-357. 10.1111/j.1467-9876.2009.00696.xMathSciNetView ArticleGoogle Scholar
- Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 2009, 6(31):187-202. 10.1098/rsif.2008.0172View ArticleGoogle Scholar
- Komorowski M, Finkenstädt B, Harper C, Rand D: Bayesian inference of biochemical kinetic parameters using the linear noise approximation. J. R. Stat. Soc. Ser 2009., C 10(343):Google Scholar
- Andreychenko A, Mikeev L, Spieler D, Wolf V: Parameter Identification for Markov Models of Biochemical Reactions. In Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT , USA , July 14-20, 2011. Proceedings, Volume 6806 of Lecture Notes in Computer Science. Springer; 2011:83-98.Google Scholar
- Andreychenko A, Mikeev L, Spieler D, Wolf V: Approximate maximum likelihood estimation for stochastic chemical kinetics. In Computational Systems Biology - 8th International Workshop, WCSB 2011, Zürich, Switzerland, June 6-8, 2011. Proceedings. Tampere International Center for Signal Processing. TICSP series # 57; 2011.Google Scholar
- Henzinger TA, Mateescu M, Wolf V: Sliding Window Abstraction for Infinite Markov Chains. In Computer Aided Verification, 21st International Conference, CAV 2009, Grenoble, France, June 26 - July 2, 2009. Proceedings, Volume 5643 of Lecture Notes in Computer Science. Springer; 2009:337-352.Google Scholar
- Munsky B, Khammash M: The finite state projection algorithm for the solution of the chemical master equation. J. Chem. Phys 2006, 124: 044144.View ArticleGoogle Scholar
- Burrage K, Hegland M, Macnamara F, Sidje B: A Krylov-based finite state projection algorithm for solving the chemical master equation arising in the discrete modelling of biological systems. In Proceedings of the Markov 150th Anniversary Conference. Boson Books, Bitingduck Press; 2006:21-38.Google Scholar
- Mateescu M, Wolf V, Didier F, Henzinger T: Fast adaptive uniformisation of the chemical master equation. IET Syst. Biol 2010, 4(6):441-452. 10.1049/iet-syb.2010.0005View ArticleGoogle Scholar
- Sidje R, Burrage K, MacNamara S: Inexact uniformization method for computing transient distributions of Markov chains. SIAM J. Sci. Comput 2007, 29(6):2562-2580. 10.1137/060662629MATHMathSciNetView ArticleGoogle Scholar
- Ljung L: System Identification: Theory for the, User. 1998.View ArticleGoogle Scholar
- Global Optimization Toolbox: User’s Guide (r2011b). Mathworks 2011 [http://www.mathworks.com/help/pdf_doc/gads/gads_tb.pdf] []
- Ugray Z, Lasdon L, Plummer JC, Glover F, Kelly J, Marti R: Scatter search and local NLP solvers: a multistart framework for global optimization. INFORMS J. Comput 2007, 19(3):328-340. 10.1287/ijoc.1060.0175MATHMathSciNetView ArticleGoogle Scholar
- Zhou JX, Brusch L, Huang S: Predicting pancreas cell fate decisions and reprogramming with a hierarchical multi-attractor model. PLoS ONE 2011, 6(3):e14752. 10.1371/journal.pone.0014752View ArticleGoogle Scholar
- Henzinger TA, Mikeev L, Mateescu M, Wolf V: Hybrid numerical solution of the chemical master equation. In Computational Methods in Systems Biology, 8th International Conference, CMSB 2010, Trento, Italy, September 29 - October 1, 2010. Proceedings. ACM; 2010:55-65.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.