Weighted next reaction method and parameter selection for efficient simulation of rare events in biochemical reaction systems
 Zhouyi Xu^{1} and
 Xiaodong Cai^{1}Email author
DOI: 10.1186/168741532011797251
© Xu and Cai; licensee Springer. 2011
Received: 19 July 2010
Accepted: 25 July 2011
Published: 25 July 2011
Abstract
The weighted stochastic simulation algorithm (wSSA) recently developed by Kuwahara and Mura and the refined wSSA proposed by Gillespie et al. based on the importance sampling technique open the door for efficient estimation of the probability of rare events in biochemical reaction systems. In this paper, we first apply the importance sampling technique to the next reaction method (NRM) of the stochastic simulation algorithm and develop a weighted NRM (wNRM). We then develop a systematic method for selecting the values of importance sampling parameters, which can be applied to both the wSSA and the wNRM. Numerical results demonstrate that our parameter selection method can substantially improve the performance of the wSSA and the wNRM in terms of simulation efficiency and accuracy.
1 Introduction
Biochemical reaction systems in living cells exhibit significant stochastic fluctuations due to a small number of molecules involved in processes such as the transcription and translation of genes [1]. A number of exact [2–7] or approximate simulation algorithms [8–19] have been developed for simulating the stochastic dynamics of such systems. Recent research shows that some rare events occurring in biochemical reaction system with an extremely small probability within a specified limited time can have profound and sometimes devastating effects [20, 21]. Hence, it is important that computational simulation and analysis of systems with critical rare events can efficiently capture such rare events. However, the existing exact simulation methods such as Gillespie's exact SSA [2, 3] often require prohibitive computation to estimate the probability of a rare events, while the approximate methods may not be able to estimate such probability accurately.
The weighted stochastic simulation algorithm (wSSA) recently developed by Kuwahara and Mura [22] based on the importance sampling technique enables one to efficiently estimate the probability of a rare event. However, the wSSA does not provide any method for selecting optimal values for importance sampling parameters. More recently, Gillespie et al. [23] analyzed the accuracy of the results yielded from the wSSA and proposed a refined wSSA that employed a tryandtest method for selecting optimal values for importance sampling parameters. It was shown that the refined wSSA could further improve the performance of wSSA. However, the tryandtest method requires some initial guessing for the sets of values from which the parameters can take. If the guessed values do not include the optimal value, then one cannot get appropriate values for the parameters. Moreover, if the number of parameters is greater than one, a very large set of values need to be guessed and tested, which may increase the likelihood of missing the optimal values and also increase computational overhead.
In this paper, we first apply the importance sampling technique to the next reaction method (NRM) of the SSA [4] and develop a weighted NRM (wNRM) as an alternative to the wSSA. We then develop a systematic method for selecting optimal values for importance sampling parameters that can be incorporated into both the wSSA and the wNRM. Our method does not need initial guess and thus can guarantee near optimal values for the parameters. Our numerical results in Section 5 demonstrate that the variance of the estimated probability of the rare event provided by the wSSA and wNRM with our parameter selection method can be more than one order magnitude lower than that provided by the wSSA or the refined wSSA for a given number of simulation runs. Moreover, the wSSA and wNRM with our parameter selection method require less simulation time than the refined wSSA for the same number of simulation runs. When this paper was under review, a method named doubly weighted SSA (dwSSA) was developed to automatically choose parameter values for the wSSA [24]. The dwSSA reduces the computational overhead required by the wSSA and the refined wSSA to select parameter values, but it produces similar variance for the estimated probability as the refined wSSA.
The remaining part of this paper is organized as follows. In Section 2, we first describe the system setup and then briefly review Gillespie's exact SSA [2, 3], the wSSA [22] and the refined wSSA [23]. In Section 3, we develop the wNRM. In Section 4, we develop a systematic method for selecting optimal values for importance sampling parameters and incorporate the parameter selection procedure into both the wSSA and the NRM. In Section 5, we give some numerical examples that illustrate the advantages of our parameter selection method. Finally in Section 6, we draw several conclusions.
2 Weighted stochastic simulation algorithms
2.1 System Description
2.2 Gillespie's exact SSA
where . Therefore, Gillespie's direct method (DM) for the SSA generates a realization of τ and μ according to PDF (3) and PMF (4), respectively, in each step of the simulation, and then updates the system state as X(t + τ) = x + ν_{ μ }.
2.3 Weighted SSA
If we employ the SSA to estimate P(E_{ R } ), we would have to make a large number n of simulation runs, with each starting at time 0 in state x_{0} and terminating either when some state x ∈ Ω is first reached or when the system time reaches T. If k is the number of those n runs that terminate for the first reason, then P(E_{ R } ) is estimated as . Since P(E_{ R } ) ≪ 1, n should be very large to get a reasonably accurate estimate of P(E_{ R } ). The wSSA employs the importance sampling technique to reduce the number of runs needed to estimate P(E_{ R } ).
which can be obtained in each simulation step.
Kuwahara and Mura [22] did not provide any method for choosing γ_{ μ } , although their numerical results with some prespecified γ_{ μ } for several reaction systems demonstrated that the wSSA could reduce computation substantially. Gillespie et al. [23] analyzed the variance of obtained from the wSSA and refined the wSSA by proposing a tryandtest method for choosing γ_{ μ } . In the tryandtest method, several sets of values are prespecified for γ_{ μ } , μ = 1, ..., M. A relatively small number of simulation runs of the standard SSA are made for each set of the values to obtain an estimate of the variance of , and then the set of values that yielded the smallest variance is chosen. Although the tryandtest method provides a way of choosing γ_{ μ } , it requires some guessing to get several sets of prespecified values for all γ_{ μ } and also some computational overhead to estimate the variance of for each set of values. More recently, the dwSSA was developed in [24] to automatically choose parameter values for the wSSA by applying the crossentropy method originally proposed in [25] for optimizing the importance sampling method.
3 Weighted NRM
The wSSA is based on the DM for the SSA, which needs to generate two random variables in each simulation step. However, the NRM of Gibson and Bruck [4] requires only one random variable in each simulation step. In this section, we apply the importance sampling technique to the NRM and develop the wNRM.
where b_{ m } (x) = γ_{ m }a_{ m } (x) is defined in the same way as in the wSSA. It is easy to verify that . If we generate τ_{ m } from an exponential distribution p(τ_{ m } ) = d_{ m } (x) exp(d_{ m } (x)τ_{ m } ), τ_{ m } > 0, as the waiting time of reaction channel m, and choose μ = arg _{ m } min{τ_{ m } , m = 1, ..., M} as the index of the channel that fires, then it can be easily shown that the PDF of τ = min{τ_{ m } , m = 1, ..., M} follows the exponential distribution in (3) and that the probability of reaction μ is q_{ μ } = d_{ μ } (x)/d_{0}(x) = b_{ μ } (x)/b_{0}(x). If we repeat this procedure in each simulation step, we would have modified the first reaction method (FRM) [3] for the standard SSA and got a weighted FRM (wFRM). Clearly, the wFRM is not efficient since it generates M random variables in each step. However, following Gibson and Bruck [4], we can convert the wFRM into a more efficient wNRM by reusing τ_{ m } s.
Following Gibson and Bruck [4], we can show that the new τ_{ m } t, m = 1, ..., M, are independent exponential random variables with parameters , m = 1, ..., M, respectively. Therefore, in the next step, we can choose μ = arg _{ m } min{τ_{ m } , m = 1, ..., M} as the index of the channel that fires as done in NRM, update t as t = τ_{ μ } , and then repeat the process just described. Clearly, the wNRM only needs to generate one random variable in each step. We can further improve the efficiency of the wNRM by using the dependency graph and the indexed priority queue defined by Gibson and Bruck [4]. The dependency graph tells precisely which propensity functions need to be updated after a reaction occurs. The indexed priority queue can be exploited to find the minimum τ_{ m } and the reaction index in each step more efficiently than finding the reaction index from the PMF (4) directly as done in the DM. However, some computational overhead is needed to maintain the data structure of .
Essentially, our wNRM runs simulation in the same way as the NRM except that the wNM generates τ_{ m } using a parameter d_{ m } (x) instead of a_{ m } (x). To estimate the probability of the rare event , we calculate a weight in each step and get using (7). The wNRM is summarized in the following algorithm:
Algorithm 1 (wNRM)
1. k_{1} ← 0, k_{2} ← 0, set values for all γ_{ m }; generate a dependency graph .
2. for i = 1 to n, do
3. t ← 0, x ← x_{0}, w ← 1.
4. evaluate a_{ m } (x) and b_{ m } (x) for all m; calculate all d_{ m } (x).
5. for each m, generate a unitinterval uniform random variable r_{ m }; τ_{ m } = ln(1/r_{ m } )/d_{ m } (x).
6. store τ_{ m } in an indexed priority queue .
7. while t ≤ T, do
8. if x ∈ Ω, then
9. k_{1} ← k_{1} + w, k_{2} ← k_{2} + w^{2}
10. break out the while loop
11. end if
12. find μ = arg _{ m } min{τ_{ m } , m = 1, ..., M} and τ = min{τ_{ m } , m = 1, ..., M} from .
13. w ← w × a_{ μ } (x)/d_{ μ } (x).
14. x ← x + ν_{ μ }, t ← τ.
15. Find a_{ m } (x) need to be updated from ; evaluate these a_{ m } (x) and the corresponding b_{ m } (x); calculate all .
16. for all m ≠ μ, ; generate a unitinterval uniform random variable r_{ μ }; ; update .
17. .
18. end while
19. end for
20.
21. calculate , with a 68% uncertainty of .
Note that Gibson and Bruck [4] argued that the NRM is more efficient than the DM of Gillespie's SSA for the loosely coupled chemical reaction systems. On the other hand, Cao et al. [5] optimized the DM and argued that the optimized DM is more efficient for most practical reaction systems. Regardless of the debate about the efficiency, here we propose the wNRM as an alternative of the wSSA which is based on the DM. While our simulation results in Section 5 demonstrate that the wNRM is more efficient than the refined wSSA for the three reaction systems tested, the wSSA may be more efficient in simulating some other systems.
As in the wSSA, Algorithm 1 does not provide a method for selecting the values of parameters γ_{ m } , m = 1, ..., M. Although we could incorporate the tryandtest method in refined wSSA into Algorithm 1, we will develop a more systematic method for selecting parameters in the next section. This parameter selection method will be applicable to both the wSSA and the wNRM and can significantly improve the performance of both algorithms as will be demonstrated in Section 5.
4 Parameter selection for wSSA and wNRM
where Q_{ J } is the probability used in simulation to generate trajectory J, which is different from the true probability P_{ J } if the original system evolves naturally. If we make n simulation runs with altered trajectory probabilities, (11) implies that we can estimate P(E_{ R } ) as which is essentially (7). The variance of depends on Q_{ J } s. Appropriate Q_{ J } s yield small variance, thereby improving the accuracy of the estimate or equivalently reducing the number of runs for a given variance. The "rule of thumb" [23, 26–28] for choosing good Q_{ J } s is that Q_{ J } should be roughly proportional to . However, at least two difficulties arise if we apply the rule of thumb based on (11). First, the number of all possible trajectories is very large and we do not know the trajectories that lead to the rare event and their probabilities. Second, since we can only adjust the probability of each reaction in each step, it is not clear how this adjustment can affect the probability of a trajectory. To overcome these difficulties, we next use an alternative expression for P(E_{ R } ) based on which we apply the importance sampling technique.
If Q(E_{ K } ) is the probability of event E_{ K } in the weighted system that evolves with adjusted probability rate constants, the rule of thumb for choosing good Q(E_{ K } ) is that we should make Q(E_{ K } ) approximately proportional to P(E_{ K } ). However, it is still difficult to apply the rule of thumb, because it is difficult to control every Q(E_{ K } ) simultaneously. Hence, we relax the rule of thumb and will maximize the Q(E_{ K } ) corresponding to the maximum P(E_{ K } ) or the one near maximum if the exact maximum P(E_{ K } ) cannot be determined precisely. The rationale of this heuristic rule is based on the following argument. If is the maximum one among all , the sum of and its closely related terms, such as , , and , very likely dominates the sum in the righthand side of (12). Maximizing not only proportionally increases , and its closely related terms, such as , , and , but also significantly increases the probability of the occurrence of the rare event. Note that a similar heuristic rule relying on the event with maximum probability was proposed in [29] for estimating the probability of rare events in highly reliable Markovian systems.
We first divide all reactions into three groups using the following general rule: G_{1} group consists of reactions with ν_{ im }η > 0, G_{2} group consists of reactions with ν_{ im }η < 0, and G_{3} group consists of reactions with ν_{ im } = 0. The rationale for the partition rule is that the reactions in G_{1} (G_{2}) group increase (decrease) the probability of the rare event and that the reactions in G_{3} group do not affect X_{ i } (t) directly. We further refine the partition rule as follows. If a reaction R_{ m } is in the G_{1} group based on the general rule but a_{ m } (x) = 0 whenever one R_{ m } reaction occurs, we move R_{ m } into the G_{3} group. Similarly, if a reaction R_{ m } is in the G_{2} group based on the general rule but a_{ m } (x) = 0 whenever one R_{ m } reaction occurs, we move R_{ m } into the G_{3} group. For most cases, we only need the general partition rule. The refining rule described here is to deal with the situation where one or several X_{ i } (t)s always take values 1 or 0 as in the system considered in Section 5.3. More refining rules may be added following the rationale just described, after we see more realworld reaction systems.
Since both (14) and (15) need to be satisfied in order for the event to occur and since , and , we get the second requirement for K_{ E } : K_{ E } ≥ η. Combining the two requirements on K_{ E } , we obtain .
The probability P(E_{ K } ) can be expressed as . Since P(X(t) ∈ ΩK_{ t } = K) is determined by the constant K, it is independent of t. Hence, we have . Due to the unimodal distribution of K_{ t } we mentioned earlier, we have for those ; for those K close to ; and quickly decreases to zero when K increases beyond . In other words, is approximately a constant for and quickly decreases to zero when . Now let us consider event E_{ K } with K = η in the case . In this case, P (X(t) ∈ ΩK_{ t } = K) is very small because this is an extreme case where and if η > 0 or and if η < 0. Therefore, we can increase P(E_{ K } ) if we increase K, but we do not want to increase K too much because as we discussed decreases quickly when K increases in the case . Consequently, we suggest that we choose , where is the standard deviation of K_{ T } which can be estimated by making hundreds of runs of the standard SSA. In case , we choose based on the same argument that decreases quickly if we further increase K_{ E } .
We next consider systems with only G_{1} and G_{2} reaction groups and then consider more general systems with all three reaction groups.
4.1 Systems with G_{1}and G_{2}reaction groups
where and . It is easy to verify that and . As defined in (8), the weight for estimating the probability of the rare event is w_{ μ } = p_{ μ } /q_{ μ } if the μ th reaction channel fires.
4.2 Systems with G_{1}, G_{2}and G_{3}reaction groups
where and as determined earlier. Since there are (K_{ E }  η)/2 + 1 terms of the sum in (21), it is difficult to find , and that maximize Q(X(t) ∈ ΩK_{ t } = K_{ E } ). So we will use a different approach to find , and as described in the following.
Suppose that the (κ + 1)th term of the sum in (22) is the largest. We further relax the rule of thumb and maximize the (κ + 1)th term of the sum in (21) to find , and .
Specifically, we calculate all from (23). If but , then is a local maximum. After obtaining all local maximums, we can find the global maximum f(κ) from the local maximums.
where .
While we can use q_{ m } in (25) to generate reactions in G_{3} group, we next develop an optional method for finetuning q_{ m } , m ∈ G_{3}, which can further reduce the variance of . We divide G_{3} group into three subgroups: G_{31}, G_{32} and G_{33}. Occurrence of reactions in G_{31} group increases the probability of occurrence of reactions in group or reduces the probability of the occurrence of the reactions in group, which in turn increases the probability of the rare event. Occurrence of reactions in G_{32} group reduces the probability of occurrence of reactions in group or increases he probability of the occurrence of reactions in group, which reduces the probability of the are event. Occurrence of reactions in G_{33} group does not change the probability of occurrence of reactions in and groups, which does not change the probability of the rare event.
where α, β ∈ (0 1) are two prespecified constants. It is not difficult to verify from (26) that . To ensure that and , we choose α and β satisfying 0 ≤ β < 1 and .
where , i = 1, 2, 3.
4.3 Systems with dimerization reactions
So far we assumed that the system did not have any dimerization reactions, i.e. the system consisted of reactions with ν_{ im }  = 0 or 1. We now generalize our methods developed earlier to the system with dimerization reactions. If there are dimerization reactions in G_{1} and G_{2} groups, we further divide G_{1} group into G_{11} and G_{12} subgroups and G_{2} group into G_{21} and G_{22} subgroups. The G_{11} group contains reactions with ν_{ im } sign(η) = 1, where sign(η) = 1 when η > 0 and sign(η) = 1 when η < 0. The G_{12} group contains reactions with ν_{ im } sign(η) = 2. The G_{21} group contains reactions with ν_{ im } sign(η) = 1, while the G_{12} group contains reactions with ν_{ im } sign(η) = 2.
Let us consider systems with G_{1} and G_{2} groups but without G_{3} group. Although we still have or equivalently , we cannot obtain four unknowns , , and from only two equations.
where and .
We then substitute and into (20) to get q_{ m } .
Now let us consider the systems with G_{1}, G_{2} and G_{3} reactions. From (29), we have , and from (15) and (29), we obtain . Since , we have . Following the derivations in Section 4.2, we can get q_{ m } for any reaction. More specifically, substituting , and the upper limit of into (21), we obtain Q(X(t) ∈ ΩK_{ t } = K_{ E } ). We can also get P(X(t) ∈ ΩK_{ t } = K_{ E } ) similar to (22) by replacing in Q(X(t) ∈ ΩK_{ t } = K_{ E } ) with . Then, we determine the maximum term of the sum in P(X(t) ∈ ΩK_{ t } = K_{ E } ) and denote the value of corresponding to the maximum term as κ + 1. We find , and by maximizing the (κ + 1)th term of the sum in Q(X(t) ∈ ΩK_{ t } = K_{ E } ). Finally, we substitute and into (20) to get q_{ m } , m ∈ G_{1} or G_{2}. For the reactions in G_{3} group, we can either substitute into (25) to obtain q_{ m } , or if we want to finetune q_{ m } , we use (26) and (27) to get q_{ m } .
4.4 wSSA and wNRM with parameter selection
The key to determining probability of each reaction q_{ m } is to find the total probability of each group, , , , , and . This requires the average number of reactions of each group occurring during the interval [0 T] in the original system, , , , , , , , . If the system is relatively simple, we may get these numbers analytically. If we cannot obtain them analytically, we can estimate them by running Gillespie's exact SSA. Since the number of runs needed to estimates these numbers is much smaller than the number of runs needed to estimate the probability of the rare event, the computational overhead is negligible.
We next summarize the wSSA incorporating the parameter selection method in the following algorithm. We will not include the procedure for finetuning the probability rate constants of reactions in the G_{3} group, but will describe how to add this optional procedure to the algorithm. We will also describe how to modify Algorithm 1 to incorporate the parameter selection procedure into the wNRM.
Algorithm 2 (wSSA with parameter selection)
1. run Gillespie's exact SSA 10^{3}10^{4}times to get estimates of , , , , , and ; determine K_{ E } from (16).
2. if the system has only G_{1}and G_{2}reactions, calculate and from (19) if there is no dimerization reaction or from (30) if there are dimerization reaction(s), if the system has G_{1}, G_{2}and G_{3}reactions, calculate , and from (24).
3. k_{1} ← 0, k_{2} ← 0.
4. for i = 1 to n, do
5. t ← 0, x ← x_{0}, w ← 1.
6. while t ≤ T, do
7. if x ∈ Ω, then
8. k_{1} ← k_{1} + w, k_{2} ← k_{2} + w^{2}
9. break out the while loop
10. end if
11. evaluate all a_{ m } (x); calculate a_{0}(x).
12. generate two unitinterval uniform random variables r_{1}and r_{2}.
13. τ ← ln(1/r)1)/a_{0}(x)
14. calculate all q_{ m } from (20) and (25).
15. μ ← smallest integer satisfying .
16. w ← w × (a_{ μ } (x)/a_{0}(x))/(q_{ μ } (x)/q_{0}(x)).
17. x ← x + ν_{ μ }, t ← t + τ.
18. end while
19. end for
20.
21. estimate , with a 68% uncertainty of .
If and we want to finetune the probability rate constants of the reactions in the G_{3} group, we modify Algorithm 2 as follows. In step 1, we also estimate , and and choose the value of α and β in (26). In step 2, we also calculate , and from (26). In step 14, we calculate q_{ m } for G_{3} reactions from (27) instead of (25). Comparing with the refined wSSA [23], the wSSA with our parameter selection procedure does not need to make some guessing about the parameters for adjusting the probability of each reaction q_{ m } , but directly calculate q_{ m } using a systematically developed method. This has two main advantages. First, our method will always adjust q_{ m } appropriately to reduce the variance of , whereas the refined wSSA may not adjust q_{ m } as well as our method, especially if the initial guessed values are far away from the optimal values. Second, as we mentioned earlier, the computational overhead of our method is negligible, whereas the refined wSSA requires nonnegligible computational overhead for determining parameters. Indeed, as we will show in Section 5, the variance of provided by the wSSA with our parameter selection method can be more than one order of magnitude lower than that provided by the refined wSSA for given number of n. Moreover, the wSSA with our parameter selection method is faster than the refined wSSA, since it requires less computational overhead to adjust q_{ m } .
We can also incorporate our parameter selection method without the finetuning procedure into the wNRM as follows. We replace the first step of Algorithm 1 with the first three steps of Algorithm 2. We then modify the fourth step of Algorithm 1 as follows: evaluate all a_{ m } (x), calculate all q_{ m } from (20) and (25), and calculate all d_{ m } (x) as d_{ m } (x) = q_{ m }a_{0}(x). Finally, we change the fifth step of Algorithm 1 to the following: find a_{ m } (x) need to be updated from and evaluate these a_{ m } (x); calculate all q_{ m } from (20) and (25), and calculate all as . We can also finetune the probability rate constants of G_{3} reactions in the wNRM in the same way as described in the previous paragraph for the wSSA. Note that since our parameter selection method employs a systematic method for partitioning reactions into three groups as discussed earlier, our method can be applied to any real chemical reaction systems.
5 Numerical examples
In this section, we present simulation results for several chemical reaction systems to demonstrate the accuracy and efficiency of the wSSA and wNRM with our parameter selection method, which we refer to as wSSAps and wNRMps, respectively, in the rest of the paper. All simulations were run in Matlab on a PC with an Intel dual Core 2.67GHz CPU and 3Gbyte memory running Windows XP.
5.1 Single species productiondegradation model
In reaction R_{1}, species S_{1} synthesizes species S_{2} with a probability rate constant c_{1}, while in reaction R_{2}, species S_{2} is degraded with a probability rate constant c_{2}. We used the same initial state and probability rate constants as used in [22, 23]: X_{1}(0) = 1, X_{2}(0) = 40, c_{1} = 1 and c_{2} = 0.025.
It is observed that the system is at equilibrium, since a_{1}(x_{0}) = c_{1} × X_{1}(0) = c_{2} × X_{2}(0) = a_{2}(x_{0}). It can be shown [22] that X_{2}(t) is a Poisson random variable with mean equal to 40. References [22, 23] sought to estimate P(E_{ R } ) = P_{t≤100}(X_{2} → θx_{0}), the probability of X_{2}(t) = θ for t ≤ 100 and several values of θ between 65 and 80. Since θ is about four to six standard deviations above the mean value 40, P_{t≤100}(X_{2} → θx_{0}) is very small.
Kuwahara and Mura [22] employed the wSSA to estimate P(E_{ R } ) and used b_{1}(x) = δa_{1}(x) and b_{2}(x) = 1/δa_{2}(x) with δ = 1.2 for four different values of θ: 65, 70, 75 and 80. Gillespie et al. [23] applied the refined wSSA to estimate P(ER) and used the same way to determine b_{1}(x) and b_{2}(x) but found that δ = 1.2 is near optimal for θ = 65 and that δ = 1.3 is near optimal for θ = 80. We repeated the simulation of Gillespie et al. [23] for θ = 65, 70, 75 and 80 with δ = 1.2, 1.25, 1.25 and 1.3, respectively. We then applied the wSSAps and the wNRMps to estimate P(E_{ R } ) for θ = 65, 70, 75 and 80. This system has only two types of reaction: R_{1} is a G_{1} reaction and R_{2} is a G_{2} reaction. Since the system is at equilibrium with a_{0}(x_{0}) = 2, with T = 100 is estimated to be 200, and thus . Using (19), we get and q_{2} = 1  q_{1}.
Estimated probability of the rare event and the sample variance σ^{2} as well as the CPU time (in s) with 10^{7} runs of the wNRMps, the wSSAps and the refined wSSA for the single species productiondegradation model (31): (a) θ = 65 and 70 and (b) θ = 75 and 80
(a)  θ= 65  θ= 70  

 σ ^{ 2 }  Time 
 σ ^{ 2 }  Time  
wNRMps  2.29 × 10^{3}  5.09 × 10^{6}  14472  1.68 × 10^{4}  3.40 × 10^{8}  16140 
wSSAps  2.29 × 10^{3}  5.10 × 10^{6}  16737  1.68 × 10^{4}  3.40 × 10^{8}  18555 
Refined wSSA  2.29 × 10^{3}  3.39 × 10^{5}  24340  1.68 × 10^{4}  4.29 × 10^{7}  25492 
(b)  θ = 75  θ = 80  
 σ ^{ 2 }  Time 
 σ ^{ 2 }  Time  
wNRMps  8.42 × 10^{6}  1.10 × 10^{10}  15640  2.99 × 10^{7}  1.82 × 10^{13}  16260 
wSSAps  8.42 × 10^{6}  1.10 × 10^{10}  18582  2.99 × 10^{7}  1.82 × 10^{13}  18960 
Refined wSSA  8.43 × 10^{6}  3.58 × 10^{9}  26314  2.99 × 10^{7}  1.29 × 10^{11}  26987 
5.2 A reaction system with G_{1}, G_{2}and G_{3}reactions
This system is at equilibrium and the mean value of X_{2}(t) is 40. We are interested in P (E_{ R } ) = P_{t≤10}(X_{2} → θx(0)), the probability of X_{2}(t) = θ for t ≤ 10. We chose θ = 65 and 68 in our simulations. To apply the wSSAps and the wNRMps to estimate P(ER), we divide the system into three groups. The G_{1} group contains reaction R_{1}; the G_{2} group includes reaction R_{2}; the G_{3} group consists of reactions R_{3} and R_{4}. When finetuning the parameters, we further divided G_{3} into a G_{31} group which contains reaction R_{3} and a G_{32} group which contains reaction R_{4}. Since the system is at equilibrium and we have a_{0}(x_{0}) = 20, a_{1}(x_{0}) = 4, a_{2}(x_{0}) = 4, a_{3}(x_{0}) = 8 and a_{4}(x_{0}) = 4, we get , , , and . Therefore, we get and the following probabilities: , and .
If θ = 65, we have η = 25. Using (23), we obtained κ = 29. Substituting κ into (24), we got , and . We then chose α = 0.85 and β = 0.80 and calculated and from (26) as and . Similarly, if θ = 68, we got κ = 26, which resulted in and . Again, selecting α = 0.85 and β = 0.80, we got and . To test whether the wNRMps and the wSSAps are sensitive to parameters α and β, we also used another set of parameters α = 0.80 and β = 0.75.
Estimated probability of the rare event and the sample variance σ^{2} as well as the CPU TIME (in s) with 10^{7} runs of the wNRMps, the wSSAps and the refined wSSA for the system given in (32): (a) θ = 65 and (b) θ = 68
(a) 
 σ ^{2}  Time 

wNRMps without G_{3} finetuning  1.14 × 10^{4}  2.77 × 10^{7}  13381 
wSSAps without G_{3} finetuning  1.14 × 10^{4}  2.74 × 10^{7}  17484 
wNRMps with α = 0.85, β = 0.80  1.14 × 10^{4}  1.27 × 10^{7}  13504 
wSSAps with α = 0.85, β = 0.80  1.14 × 10^{4}  1.28 × 10^{7}  16649 
wNRMps with α = 0.80, β = 0.75  1.14 × 10^{4}  1.29 × 10^{7}  13540 
wSSAps with α = 0.80, β = 0.75  1.14 × 10^{4}  1.29 × 10^{7}  17243 
Refined wSSA  1.14 × 10^{4}  1.54 × 10^{6}  24499 
(b) 
 σ ^{ 2 }  Time 
wNRMps without G_{3} finetuning  1.49 × 10^{5}  1.14 × 10^{8}  14087 
wSSAps without G_{3} finetuning  1.49 × 10^{5}  1.09 × 10^{8}  17285 
wNRMps with α = 0.85, β = 0.80  1.49 × 10^{5}  3.28 × 10^{9}  13920 
wSSAps with α = 0.85, β = 0.80  1.49 × 10^{5}  3.29 × 10^{9}  17862 
wNRMps with α = 0.80, β = 0.75  1.49 × 10^{5}  3.32 × 10^{9}  14018 
wSSAps with α = 0.80, β = 0.75  1.49 × 10^{5}  3.30 × 10^{9}  17858 
Refined wSSA  1.49 × 10^{5}  7.93 × 10^{8}  24739 
5.3 Enzymatic futile cycle model
With the above rate constants and initial state, X_{2}(t) and X_{2}(5) tend to equilibrate about their initial value 50. References [22, 23] sought to estimate P(E_{ R } ) = P_{t≤100}(X_{5} → θx(0)), the probability that X_{5}(t) = θ for t ≤ 100 and several values of θ between 25 and 40. We repeated simulations with the refined wSSA in [23] for θ = 25 and 40. The refined wSSA employed the following parameters γ_{3} = δ, γ_{6} = 1/δ and γ_{ m } = 1, m = 1, 2, 4, 5, and we used the best value of δ determined in [23]: δ = 0.35 for θ = 25 and δ = 0.60 for θ = 40.
In this system, we always have X_{2}(t) + X_{5}(t) = 100. So when the rare event occurs at time t, we have X_{5}(t) = θ and X_{2}(t) = 100  θ. The rare event is therefore defined as X_{5} = 50 + η with η = θ  50 or equivalently X_{2} = 50  η. According to the partition rule defined in Section 4, R_{3} is a G_{2} reaction; R_{6} is a G_{1} reaction; R_{1}, R_{2}, R_{4} and R_{5} are G_{3} reactions.
We ran Gillespie's SSA 10^{3} times and got an estimate of as , and thus . When θ = 40, we have η = 10. Using (23) and K_{ E } = 432, we obtained κ = 6 Substituting κ into (24), we got , and . In this example, there always have certain reactions whose propensity functions are zero, since we always have X_{1}(t) + X_{3}(t) = 1 and X_{4}(t) + X_{6}(t) = 1. Due to this special property, we calculate the probability of each reaction as follows. The system has only 4 states in terms of X_{3}(t) and X_{6}(t): X_{3}(t)X_{6}(t) = 11, 01, 10 or 00. From the 10^{3} runs of Gillespie's exact SSA, we estimated the probability of reactions occurring in reach state as P_{11} ≈ 1/2, P_{01} = P_{10} ≈ 1/4 and P_{00} ≈ 0. Note that reaction R_{6} only occurs in states 11 and 01 and we denote its probability in these two states used in the wSSAps as and and its natural probability as and . The probability can be calculated as and can be approximated as assuming X_{2}(t) = 50 since the system is in equilibrium. Then, using the relationships: and , we get and . Reaction R_{3} only occurs in states 11 and 10 and its probability can be obtained similarly as and . In a state s (s = 11, 01, 10 or 00), we calculate and then calculated , m = 1, 2, 4 and 5, from (25). Surprisingly, , , and we calculated are very close to the values used in the refined wSSA which were obtained by making 10^{5} runs of the refined wSSA for each of seven guessed values of γ. In contrast, we do not need to guess the values of parameters but calculate them analytically, and all the information needed in our calculation was obtained from 10^{3} of Gillespie's exact SSA, which incurs negligible computational overhead.
When θ = 25, we have η = 25. Using (23) and , we obtained κ = 3. Substituting κ into (24), we got , and . Similar to the previous calculation, we got , , and and then calculated the probabilities of other reactions from (25). Again , , and we obtained are very close to the values used in the refined wSSA.
Estimated probability of the rare event and the sample variance σ^{2} as well as the CPU TIME (in s) with 10^{6} runs of the wNRMps, the wSSAps and the refined wSSA for the enzyme futile cycle model (35): (a) θ = 25 and (b) θ = 40
(a) 
 σ ^{2}  Time 

wNRMps  1.74 × 10^{7}  1.81 × 10^{13}  4183.2 
wSSAps  1.74 × 10^{7}  1.80 × 10^{13}  5316.9 
Refined wSSA  1.74 × 10^{7}  1.61 × 10^{13}  5337.2 
(b) 
 σ ^{ 2 }  Time 
wNRMps  4.21 × 10^{2}  1.51 × 10^{3}  3589.4 
wSSAps  4.21 × 10^{2}  1.51 × 10^{3}  4388.3 
Refined wSSA  4.21 × 10^{2}  1.51 × 10^{3}  4406.6 
6 Conclusion
The wSSA and the refined wSSA are innovative variation of Gillespie's standard SSA. They provide an efficient way for estimating the probability of rare events that occur in chemical reaction systems with an extremely low probability in a given time period. The wSSA was developed based on the directed method of the SSA. In this paper, we developed an alternative wNRM for estimating the probability of the rave event. We also devised a systematic method for selecting the values of importance sampling parameters, which is absent in the wSSA and the refined wSSA.
This parameter selection method was then incorporated into the wSSA and the wNRM. Numerical examples demonstrated that comparing with the refined wSSA and the dwSSA, the wSSA and the wNRM with our parameter selection procedure could substantially reduce the variance of the estimated probability of the rare event and speed up simulation.
Competing interets
The author declares that they have no competing interests.
Abbreviations
 NRM:

next reaction method
 wNRM:

weighted NRM
 wSSA:

weighted stochastic simulation algorithm.
Declarations
Acknowledgements
This work was supported by the National Science Foundation (NSF) under NSF CAREER Award no. 0746882.
Authors’ Affiliations
References
 Kærn M, Elston TC, Blake WJ, Collins JJ: Stochasticity in gene expression: from theories to phenotypes. Nat Rev Genet 2005, 6: 451464. 10.1038/nrg1615View ArticleGoogle Scholar
 Gillespie DT: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 1976, 22: 403434. 10.1016/00219991(76)900413MathSciNetView ArticleGoogle Scholar
 Gillespie DT: Exact stochastic simulation of coupled chemical reaction. J Phys Chem 1977, 81: 23402361. 10.1021/j100540a008View ArticleGoogle Scholar
 Gibson MA, Bruck J: Exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A 2000, 105: 18761889.View ArticleGoogle Scholar
 Cao Y, Li H, Petzold LR: Efficient formulation of the stochastic simulation algorithm for chemically reacting systems. J Chem Phys 2004,121(9):40594067. 10.1063/1.1778376View ArticleGoogle Scholar
 Cai X: Exact stochastic simulation of coupled chemical reactions with delays. J Chem Phys 2007, 126: 124108. 10.1063/1.2710253View ArticleGoogle Scholar
 Anderson DF: A modified next reaction method for simulating systems with time varying rate constants and systems with delays. J Chem Phys 2007,127(21):214107. 10.1063/1.2799998View ArticleGoogle Scholar
 Gillespie DT: Approximate accelerated stochastic simulation of chemically reacting systems. J Chem Phys 2001, 115: 17161733. 10.1063/1.1378322View ArticleGoogle Scholar
 Gillespie DT, Petzold LR: Improved leapsize selection for accelerated stochastic simulation. J Chem Phys 2003,119(6):82298234. 10.1063/1.1613254View ArticleGoogle Scholar
 Tian T, Burrage K: Binomial leap methods for simulating stochastic chemical kinetics. J Chem Phys 2004, 121: 1035610364. 10.1063/1.1810475View ArticleGoogle Scholar
 Chatterjee A, Vlachos DG, Katsoulakis MA: Binomial distribution based τ leap accelerated stochastic simulation. J Chem Phys 2005, 122: 024112. 10.1063/1.1833357View ArticleGoogle Scholar
 Cao Y, Gillespie DT, Petzold LR: Efficient step size selection for the tauleap simulation method. J Chem Phys 2006,124(4):044109. 10.1063/1.2159468View ArticleGoogle Scholar
 Cao Y, Gillespie D, Petzold L: Multiscale stochastic simulation algorithm with stochastic partial equilibrium assumption for chemically reacting systems. J Comput Phys 2005, 206: 395411. 10.1016/j.jcp.2004.12.014MathSciNetView ArticleMATHGoogle Scholar
 Goutsias J: Quasiequlibrium approximation of fast reaction kinetics in stochastic biochemical systems. J Chem Phys 2005, 122: 184102. 10.1063/1.1889434View ArticleGoogle Scholar
 Haseltine EL, Rawlings JB: Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. J Chem Phys 2002,117(15):69596969. 10.1063/1.1505860View ArticleGoogle Scholar
 Rao CV: AP Arkin, Stochastic chemical kinetics and the quasisteadystate assumption: application to the Gillespie algorithem. J Chem Phys 2003,18(11):49995010.View ArticleGoogle Scholar
 Salis H, Kaznessis Y: Accurate hybrid stochastic simulation of a system of coupled chemical or biochemical reactions. J Chem Phys 2005, 122: 54103.154103.13.View ArticleGoogle Scholar
 Auger A, Chatelain P, Koumoutsakos P: Rleaping: accelerating the stochastic simulation algorithm by reaction leaps. J Chem Phys 2006,125(8):084103084115. 10.1063/1.2218339View ArticleGoogle Scholar
 Cai X, Xu Z: Kleap methods for accelerating stochastic simulation of chemically reacting systems. J Chem Phys 2007,126(7):074102. 10.1063/1.2436869View ArticleGoogle Scholar
 Csete M, Doyle J: Bow ties, metabolism and disease. Trends Biotechnol 2004, 22: 446450. 10.1016/j.tibtech.2004.07.007View ArticleGoogle Scholar
 Egger G, Liang G, Aparicio A, Jones PA: Epigenetics in human disease and prospects for epigenetic therapy. Nature 2004, 429: 457. 10.1038/nature02625View ArticleGoogle Scholar
 Kuwahara H, Mura I: An efficient and exact stochastic simulation method to analyze rare events in biochemical systems. J Chem Phys 2008, 129: 165101. 10.1063/1.2987701View ArticleGoogle Scholar
 Gillespie DT, Roh M, Petzold LR: Refining the weighted stochastic simulation algorithm. J Chem Phys 2009, 130: 174103. 10.1063/1.3116791View ArticleGoogle Scholar
 Daigle BJ: M Roh, DT Gillespie, LR Petzold, Automated estimation of rare event probabilities in biochemical systems. J Chem Phys 2010, 134: 044110.View ArticleGoogle Scholar
 Rubinstein RY, Kroese DP: The CrossEntropy Method. A Unified Approach to Combinatorial Optimization, MonteCarlo Simulation and Machine Learning. Springer; 2004.MATHGoogle Scholar
 Liu JS: Monte Carlo Strategies in Scientific Computing. Springer; 2001.MATHGoogle Scholar
 Bucklew JA: Introduction to Rare Event Simulation. Springer, New York; 2004.View ArticleMATHGoogle Scholar
 Rubino G, Tuffin B, (eds): Rare Event Simulation using Monte Carlo Methods. Wiley, New York; 2009.MATHGoogle Scholar
 L'Ecuyer P, Tuffin B: Effective approximation of zerovariance simulation in a reliability setting. Proc. European Simulation Modelling Conf. (ESM'2007), St. Julians, Malta 2007, 4857.Google Scholar
 Gillespie DT: The chemical Langevin equation. J Chem Phys 2000,113(1):297306. 10.1063/1.481811View ArticleGoogle Scholar
 Xu Z, Cai X: Unbiased tauleap methods for stochastic simulation of chemically reacting systems. J Chem Phys 2008,128(15):154112. 10.1063/1.2894479View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.