Let us denote the set of all possible state trajectories in the time interval [0 *T*] as and the set of trajectories that first reach any state in Ω during [0 *T*] as . Let the probability of a trajectory *J* be *P*_{
J
} . Then, we have , where the indicator function if or 0 if . Importance sampling used in the wSSA and the wNRM arises from the factor that we can write *P*(*E*_{
R
} ) as

where *Q*_{
J
} is the probability used in simulation to generate trajectory *J*, which is different from the true probability *P*_{
J
} if the original system evolves naturally. If we make *n* simulation runs with altered trajectory probabilities, (11) implies that we can estimate *P*(*E*_{
R
} ) as which is essentially (7). The variance of depends on *Q*_{
J
} s. Appropriate *Q*_{
J
} s yield small variance, thereby improving the accuracy of the estimate or equivalently reducing the number of runs for a given variance. The "rule of thumb" [23, 26–28] for choosing good *Q*_{
J
} s is that *Q*_{
J
} should be roughly proportional to . However, at least two difficulties arise if we apply the rule of thumb based on (11). First, the number of all possible trajectories is very large and we do not know the trajectories that lead to the rare event and their probabilities. Second, since we can only adjust the probability of each reaction in each step, it is not clear how this adjustment can affect the probability of a trajectory. To overcome these difficulties, we next use an alternative expression for *P*(*E*_{
R
} ) based on which we apply the importance sampling technique.

Let us denote the number of reactions occurring in the time interval [0 *t*] as *K*_{
t
} and the maximum value of *K*_{
T
} as . Let *E*_{
K
} be the rare event occurring at the *K* th reaction at any *t* ≤ *T*, and *P*(*E*_{
K
} ) be the probability of *E*_{
K
} in the original system that evolve naturally with the original probability rate constants. Then, we have

If *Q*(*E*_{
K
} ) is the probability of event *E*_{
K
} in the weighted system that evolves with adjusted probability rate constants, the rule of thumb for choosing good *Q*(*E*_{
K
} ) is that we should make *Q*(*E*_{
K
} ) approximately proportional to *P*(*E*_{
K
} ). However, it is still difficult to apply the rule of thumb, because it is difficult to control every *Q*(*E*_{
K
} ) simultaneously. Hence, we relax the rule of thumb and will maximize the *Q*(*E*_{
K
} ) corresponding to the maximum *P*(*E*_{
K
} ) or the one near maximum if the exact maximum *P*(*E*_{
K
} ) cannot be determined precisely. The rationale of this heuristic rule is based on the following argument. If is the maximum one among all , the sum of and its closely related terms, such as , , and , very likely dominates the sum in the right-hand side of (12). Maximizing not only proportionally increases , and its closely related terms, such as , , and , but also significantly increases the probability of the occurrence of the rare event. Note that a similar heuristic rule relying on the event with maximum probability was proposed in [29] for estimating the probability of rare events in highly reliable Markovian systems.

Before proceeding with our derivations, we need to specify Ω. In the rest of the paper, we assume that Ω contains one single state **X** defined as *X*_{
i
} = *X*_{
i
} (0) + *η*, where *η* is a constant and *i* ∈ {1, 2, ..., *N*}. Let us denote the number of firings of the *m* th reaction channel in the trajectory leading to the rare event as *K*_{
m
} . Then, we have

We first divide all reactions into three groups using the following general rule: *G*_{1} group consists of reactions with *ν*_{
im
}*η* > 0, *G*_{2} group consists of reactions with *ν*_{
im
}*η* < 0, and *G*_{3} group consists of reactions with *ν*_{
im
} = 0. The rationale for the partition rule is that the reactions in *G*_{1} (*G*_{2}) group increase (decrease) the probability of the rare event and that the reactions in *G*_{3} group do not affect *X*_{
i
} (*t*) directly. We further refine the partition rule as follows. If a reaction *R*_{
m
} is in the *G*_{1} group based on the general rule but *a*_{
m
} (**x**) = 0 whenever one *R*_{
m
} reaction occurs, we move *R*_{
m
} into the *G*_{3} group. Similarly, if a reaction *R*_{
m
} is in the *G*_{2} group based on the general rule but *a*_{
m
} (**x**) = 0 whenever one *R*_{
m
} reaction occurs, we move *R*_{
m
} into the *G*_{3} group. For most cases, we only need the general partition rule. The refining rule described here is to deal with the situation where one or several *X*_{
i
} (*t*)s always take values 1 or 0 as in the system considered in Section 5.3. More refining rules may be added following the rationale just described, after we see more real-world reaction systems.

We typically only need to consider elementary reactions including bimolecular and monomolecular reactions [30]. Hence, the possible values for all *ν*_{
im
} are 0, ±1, ±2. For the simplicity of derivations, we now only consider the case where *ν*_{
im
} = 0, ±1, i.e., we assume that the system does not have any bimolecular reactions with two identical reactant molecules or dimerization reactions. We will later generalize our method to the system with dimerization reactions. Let us define , and , then (13) becomes

Let us denote as the expected value of *K*_{
t
} . Since the number of reactions occurring in any small time interval is approximately a Poisson random variable [8], *K*_{
t
} is the sum of a large number of independent Poisson random variables when *t* is relatively large. Then, by the central limit theorem, *K*_{
t
} can be approximated as a Gaussian random variable with mean . Indeed, in all chemical reaction systems [6, 19, 31] we tested so far, we observed that *Kt* follows a unimodal distribution with a peak at and its standard deviation is small relative to . Since the mean first passage time of the rare event is much larger than *T*[23], the rare event most likely occurs at a time near *T*. Based on these two observations, we argue that for all . Therefore, we should have . When occurs, we have

Since both (14) and (15) need to be satisfied in order for the event to occur and since , and , we get the second requirement for *K*_{
E
} : *K*_{
E
} ≥ |*η*|. Combining the two requirements on *K*_{
E
} , we obtain .

The probability *P*(*E*_{
K
} ) can be expressed as . Since *P*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*) is determined by the constant *K*, it is independent of *t*. Hence, we have . Due to the unimodal distribution of *K*_{
t
} we mentioned earlier, we have for those ; for those *K* close to ; and quickly decreases to zero when *K* increases beyond . In other words, is approximately a constant for and quickly decreases to zero when . Now let us consider event *E*_{
K
} with *K* = |*η*| in the case . In this case, *P* (**X**(*t*) ∈ Ω|*K*_{
t
} = *K*) is very small because this is an extreme case where and if *η* > 0 or and if *η* < 0. Therefore, we can increase *P*(*E*_{
K
} ) if we increase *K*, but we do not want to increase *K* too much because as we discussed decreases quickly when *K* increases in the case . Consequently, we suggest that we choose , where is the standard deviation of *K*_{
T
} which can be estimated by making hundreds of runs of the standard SSA. In case , we choose based on the same argument that decreases quickly if we further increase *K*_{
E
} .

Applying the relaxed rule of thumb, we need to adjust probability rate constants in simulation to maximize . Since we do not change the distribution of *τ*, we do not change the distribution of *K*_{
T
} and thus . Hence, maximizing *Q*(*E*_{
K
} ) is equivalent to maximizing *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ). Now we are in a position to summarize our strategy of applying the important sampling technique in simulation as follows: we will choose probability parameters to maximize *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ), where

We next consider systems with only *G*_{1} and *G*_{2} reaction groups and then consider more general systems with all three reaction groups.

### 4.1 Systems with *G*_{1}and *G*_{2}reaction groups

Since we do not have *G*_{3} group, (15) becomes

Combining (14) and (17), we get and if the final state after the last reaction occurs is in Ω. The last reaction should be a reaction from *G*_{1} group. Otherwise, the state already reached Ω before the last reaction occurs. Suppose that in simulation the total probability of the occurrence of reactions in *G*_{1} group is a constant and then the total probability of the occurrence of reactions in *G*_{2} group is . Then, *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) can be found from a binomial distribution as follows

where and as determined earlier. Setting the derivative of *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) with respect to to be zero, we get and that maximize *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) as follows:

To ensure that reactions in *G*_{1} (*G*_{2}) group occur with probability in each step of simulation, we adjust the probability of each reaction as follows

where and . It is easy to verify that and . As defined in (8), the weight for estimating the probability of the rare event is *w*_{
μ
} = *p*_{
μ
} /*q*_{
μ
} if the *μ* th reaction channel fires.

### 4.2 Systems with *G*_{1}, *G*_{2}and *G*_{3}reaction groups

Combining (14) and (15), we get and . Since , we have . Suppose that in simulation the total probabilities of the occurrence of reactions in *G*_{1}, *G*_{2} and *G*_{3} are constants , and , respectively. Then, *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) can be found from a multinomial distribution as follows

where and as determined earlier. Since there are (*K*_{
E
} *- η*)/2 + 1 terms of the sum in (21), it is difficult to find , and that maximize *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ). So we will use a different approach to find , and as described in the following.

Let , and be the average number of reactions of *G*_{1}, *G*_{2} and *G*_{3} groups that occur in the time interval [0 *T*] in the original system. Since we have , we define , , and . Then, we can approximate *P*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) in the original system, which is the counter part of *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) in the weighted system, using the right-hand side of (21) but with , *i* = 1, 2, 3, replaced by *i* = 1, 2, 3, respectively. This gives

Suppose that the (*κ* + 1)th term of the sum in (22) is the largest. We further relax the rule of thumb and maximize the (*κ* + 1)th term of the sum in (21) to find , and .

It is not difficult to find the (*κ* + 1)th term of the sum in (22). Let us denote the term of the sum in (22) as . We can exhaustively search over all , to find *κ*. However, this may require relatively large computation because the factorials involved in . We can reduce computation by searching over , , which are given by

Specifically, we calculate all from (23). If but , then is a local maximum. After obtaining all local maximums, we can find the global maximum *f*(*κ*) from the local maximums.

After we find *κ*, we set the partial derivatives of the (*κ* + 1)th term of the sum in (21) with respect to and to be zero. This gives the following optimal , and

Substituting and in (24) into (20), we get the probability *q*_{
m
} , *m* ∈ *G*_{1} or *G*_{2} that is used to generate the *m* th reaction in each step of simulation. For *G*_{3} group, we get the probability of each reaction as follows

where .

While we can use *q*_{
m
} in (25) to generate reactions in *G*_{3} group, we next develop an optional method for fine-tuning *q*_{
m
} , *m* ∈ *G*_{3}, which can further reduce the variance of . We divide *G*_{3} group into three subgroups: *G*_{31}, *G*_{32} and *G*_{33}. Occurrence of reactions in *G*_{31} group increases the probability of occurrence of reactions in group or reduces the probability of the occurrence of the reactions in group, which in turn increases the probability of the rare event. Occurrence of reactions in *G*_{32} group reduces the probability of occurrence of reactions in group or increases he probability of the occurrence of reactions in group, which reduces the probability of the are event. Occurrence of reactions in *G*_{33} group does not change the probability of occurrence of reactions in and groups, which does not change the probability of the rare event.

Let , and be the average number of reactions from *G*_{31}, *G*_{32} and *G*_{33} that occur in the time interval [0 *T*] in the original system. we define , and . Our goal is to make *Q*_{31} to be greater than and *Q*_{32} to be less than to increase the probability of the rare event. However, this may not feasible when . Hence, we can fine-tune , and only when and propose the following formula to determine *Q*_{31}, *Q*_{32} and *Q*_{33}:

where *α*, *β* ∈ (0 1) are two pre-specified constants. It is not difficult to verify from (26) that . To ensure that and , we choose *α* and *β* satisfying 0 ≤ *β* < 1 and .

Finally, we obtain *q*_{
m
} for *m* ∈ *G*_{3} as follows

where , *i* = 1, 2, 3.

### 4.3 Systems with dimerization reactions

So far we assumed that the system did not have any dimerization reactions, i.e. the system consisted of reactions with |*ν*_{
im
} | = 0 or 1. We now generalize our methods developed earlier to the system with dimerization reactions. If there are dimerization reactions in *G*_{1} and *G*_{2} groups, we further divide *G*_{1} group into *G*_{11} and *G*_{12} subgroups and *G*_{2} group into *G*_{21} and *G*_{22} subgroups. The *G*_{11} group contains reactions with *ν*_{
im
} sign(*η*) = 1, where sign(*η*) = 1 when *η* > 0 and sign(*η*) = -1 when *η* < 0. The *G*_{12} group contains reactions with *ν*_{
im
} sign(*η*) = 2. The *G*_{21} group contains reactions with *ν*_{
im
} sign(*η*) = -1, while the *G*_{12} group contains reactions with *ν*_{
im
} sign(*η*) = -2.

Let us define , , and . Clearly, we have and . Then, (13) becomes

Let us consider systems with *G*_{1} and *G*_{2} groups but without *G*_{3} group. Although we still have or equivalently , we cannot obtain four unknowns , , and from only two equations.

Suppose that , , and are average number of reactions from *G*_{11}, *G*_{12}, *G*_{21} and *G*_{22} groups that occur in the time interval [0 *T*] in the original system. We notice from (20) that we do not change the ratio of the probabilities of two reactions in the same group, i.e., if *m*_{1} and *m*_{2} are in the same *G*_{1} or *G*_{2} group. Therefore, we would expect that and . Using these two relationships, we can write (28) as

where and .

From (17) and (29), we obtain and . Substituting and into (18) and maximizing *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ), we obtain

We then substitute and into (20) to get *q*_{
m
} .

Now let us consider the systems with *G*_{1}, *G*_{2} and *G*_{3} reactions. From (29), we have , and from (15) and (29), we obtain . Since , we have . Following the derivations in Section 4.2, we can get *q*_{
m
} for any reaction. More specifically, substituting , and the upper limit of into (21), we obtain *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ). We can also get *P*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) similar to (22) by replacing in *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) with . Then, we determine the maximum term of the sum in *P*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ) and denote the value of corresponding to the maximum term as *κ* + 1. We find , and by maximizing the (*κ* + 1)th term of the sum in *Q*(**X**(*t*) ∈ Ω|*K*_{
t
} = *K*_{
E
} ). Finally, we substitute and into (20) to get *q*_{
m
} , *m* ∈ *G*_{1} or *G*_{2}. For the reactions in *G*_{3} group, we can either substitute into (25) to obtain *q*_{
m
} , or if we want to fine-tune *q*_{
m
} , we use (26) and (27) to get *q*_{
m
} .

### 4.4 wSSA and wNRM with parameter selection

The key to determining probability of each reaction *q*_{
m
} is to find the total probability of each group, , , , , and . This requires the average number of reactions of each group occurring during the interval [0 *T*] in the original system, , , , , , , , . If the system is relatively simple, we may get these numbers analytically. If we cannot obtain them analytically, we can estimate them by running Gillespie's exact SSA. Since the number of runs needed to estimates these numbers is much smaller than the number of runs needed to estimate the probability of the rare event, the computational overhead is negligible.

We next summarize the wSSA incorporating the parameter selection method in the following algorithm. We will not include the procedure for fine-tuning the probability rate constants of reactions in the *G*_{3} group, but will describe how to add this optional procedure to the algorithm. We will also describe how to modify Algorithm 1 to incorporate the parameter selection procedure into the wNRM.

**Algorithm 2 (wSSA with parameter selection)**

*1. run Gillespie's exact SSA* 10^{3}-10^{4}*times to get estimates of*, , , , , *and*; *determine K*_{
E
} *from* (16).

*2. if the system has only G*_{1}*and G*_{2}*reactions, calculate**and**from* (19) *if there is no dimerization reaction or from* (30) *if there are dimerization reaction(s)*, *if the system has G*_{1}*, G*_{2}*and G*_{3}*reactions, calculate*, *and**from* (24).

*3. k*_{1} ← 0*, k*_{2} ← 0.

*4.* **for** *i* = 1 *to n*, **do**

*5. t* ← 0, **x** ← **x**_{0}*, w* ← 1.

*6.* **while** *t* ≤ *T*, **do**

*7.* **if x** ∈ Ω, **then**

*8. k*_{1} ← *k*_{1} + *w, k*_{2} ← *k*_{2} + *w*^{2}

*9. break out the while loop*

*10.* **end if**

*11. evaluate all a*_{
m
} (**x**)*; calculate a*_{0}(**x**).

*12. generate two unit-interval uniform random variables r*_{1}*and r*_{2}.

*13. τ* ← ln(1/*r*)1)/*a*_{0}(**x**)

*14. calculate all q*_{
m
} *from* (20) *and* (25).

*15. μ* ← *smallest integer satisfying*.

*16. w* ← *w* × (*a*_{
μ
} (**x**)/*a*_{0}(**x**))/(*q*_{
μ
} (**x**)/*q*_{0}(**x**)).

*17.* **x** ← **x** + *ν*_{
μ
}, *t* ← *t* + *τ*.

*18.* **end while**

*19.*
**end for**

*20.*

*21. estimate**, with a 68% uncertainty of*.

If and we want to fine-tune the probability rate constants of the reactions in the *G*_{3} group, we modify Algorithm 2 as follows. In step 1, we also estimate , and and choose the value of *α* and *β* in (26). In step 2, we also calculate , and from (26). In step 14, we calculate *q*_{
m
} for *G*_{3} reactions from (27) instead of (25). Comparing with the refined wSSA [23], the wSSA with our parameter selection procedure does not need to make some guessing about the parameters for adjusting the probability of each reaction *q*_{
m
} , but directly calculate *q*_{
m
} using a systematically developed method. This has two main advantages. First, our method will always adjust *q*_{
m
} appropriately to reduce the variance of , whereas the refined wSSA may not adjust *q*_{
m
} as well as our method, especially if the initial guessed values are far away from the optimal values. Second, as we mentioned earlier, the computational overhead of our method is negligible, whereas the refined wSSA requires non-negligible computational overhead for determining parameters. Indeed, as we will show in Section 5, the variance of provided by the wSSA with our parameter selection method can be more than one order of magnitude lower than that provided by the refined wSSA for given number of *n*. Moreover, the wSSA with our parameter selection method is faster than the refined wSSA, since it requires less computational overhead to adjust *q*_{
m
} .

We can also incorporate our parameter selection method without the fine-tuning procedure into the wNRM as follows. We replace the first step of Algorithm 1 with the first three steps of Algorithm 2. We then modify the fourth step of Algorithm 1 as follows: evaluate all *a*_{
m
} (**x**), calculate all *q*_{
m
} from (20) and (25), and calculate all *d*_{
m
} (**x**) as *d*_{
m
} (**x**) = *q*_{
m
}*a*_{0}(**x**). Finally, we change the fifth step of Algorithm 1 to the following: find *a*_{
m
} (**x**) need to be updated from and evaluate these *a*_{
m
} (**x**); calculate all *q*_{
m
} from (20) and (25), and calculate all as . We can also fine-tune the probability rate constants of *G*_{3} reactions in the wNRM in the same way as described in the previous paragraph for the wSSA. Note that since our parameter selection method employs a systematic method for partitioning reactions into three groups as discussed earlier, our method can be applied to any real chemical reaction systems.