Modeling stochasticity and variability in gene regulatory networks

Modeling stochasticity in gene regulatory networks is an important and complex problem in molecular systems biology. To elucidate intrinsic noise, several modeling strategies such as the Gillespie algorithm have been used successfully. This article contributes an approach as an alternative to these classical settings. Within the discrete paradigm, where genes, proteins, and other molecular components of gene regulatory networks are modeled as discrete variables and are assigned as logical rules describing their regulation through interactions with other components. Stochasticity is modeled at the biological function level under the assumption that even if the expression levels of the input nodes of an update rule guarantee activation or degradation there is a probability that the process will not occur due to stochastic effects. This approach allows a finer analysis of discrete models and provides a natural setup for cell population simulations to study cell-to-cell variability. We applied our methods to two of the most studied regulatory networks, the outcome of lambda phage infection of bacteria and the p53-mdm2 complex.


Introduction
Variability at the molecular level, defined as the phenotypic differences within a genetically identical population of cells exposed to the same environmental conditions, has been observed experimentally [1][2][3][4]. Understanding mechanisms that drive variability in molecular networks is an important goal of molecular systems biology, for which mathematical modeling can be very helpful. Different modeling strategies have been used for this purpose and, depending on the level of abstraction of the mathematical models, there are several ways to introduce stochasticity. Dynamic mathematical models can be broadly divided into two classes: continuous, such as systems of differential equations (and their stochastic variants) and discrete, such as Boolean networks and their generalizations (and their stochastic variants). This article will focus on stochasticity and discrete models.
Discrete models do not require detailed information about kinetic rate constants and they tend to be more intuitive. In turn, they only provide qualitative information about the system. The most general setting is as follows. Network nodes represent genes, proteins, and other molecular components of gene regulation, while network edges describe biological interactions among network nodes that are given as logical rules representing their interactions. Time in this framework is implicit and progresses in discrete steps. More formally, let x 1 , ..., x n be variables, which can take values in finite sets X 1 , ..., X n , respectively. Let X = X 1 × ... × X n be the Cartesian product. A discrete dynamical system (DDS) in the variables x 1 , ..., x n is a function f = (f 1 , . . . , f n ) : X → X where each coordinate function f i : X X i is a function in a subset of {x 1 , ..., x n }. Dynamics is generated by iteration of f, and different update schemes can be used for this purpose. As an example, if X i = {0, 1} for all i, then each f i is a Boolean rule and f is a Boolean network where all the variables are updated simultaneously. We will assume that each X i comes with a natural total ordering of its elements (corresponding to the concentration levels of the associated molecular species). Examples of this type of dynamical system representation are Boolean networks, logical models and Petri nets [5][6][7].
To account for stochasticity in this setting several methods have been considered. Probabilistic Boolean networks (PBNs) [8,9] introduce stochasticity in the update functions, allowing a different update function to be used at each iteration, chosen from a probability space of such functions for each network node. For other approaches, see [10][11][12]. These models will be discussed in more detail in the next section. In this article we present a model type related to PBNs, with additional features. We show that this model type is natural and a useful way to simulate gene regulation as a stochastic process, and is very useful to simulate experiments with cell populations.

Modeling stochasticity in gene regulatory networks
Gene regulation processes are inherently stochastic. Accurately modeling this stochasticity is a complex and important goal in molecular system biology. Depending on the level of knowledge of the biological system and the availability of data for it one could follow different approaches. For instance, viewing a gene regulatory network as a biochemical reaction network, the Gillespie algorithm can be applied to simulate each biochemical reaction separately generating a random walk corresponding to a solution of the chemical master equation of the system [13,14]. At an even more detailed level one could introduce time delays into the Gillespie simulations to account for realistic time delays in activation or degradation such as in circadian rhythms [15][16][17]. At a higher level of abstraction, stochastic differential equations [18] contain a deterministic approximation of the system and an additional random white noise term. However, all these schemes require that all the kinetic rate constants to be known which could represent a strong constraint due to the difficulty of measuring kinetic parameters, limiting these approaches to small systems.
As mentioned in the introduction, discrete models are an alternative to continuous models, which do not depend on rate constants. In this setting, several approaches to introduce stochasticity have been proposed. Specially for Boolean networks, stochasticity has been introduced by flipping node states from 0 to 1 or vice versa with some flip probability [12,[19][20][21]. However, it has been argued that this way of introducing stochasticity into the system usually leads to over-representation of noise [11]. The main criticism of this approach is that it does not take into consideration the correlation between the expression values of input nodes and the probability of flipping the expression of a node due to noise. In fact, this approach models the stochasticity at a node regardless of the susceptibility to noise of the underlying biological function [11].
Probabilistic Boolean networks [8,9,22] is another stochastic method proposed within the discrete strategy. PBNs model the choice among alternate biological functions during the iteration process, rather than modeling the stochasticity of the function failure itself. We have adopted a special case of this setting, in which every node has associated to it two functions: the function that governs its evolution over time and the identity function. If the first is chosen, then the node is updated based on its logical rule. When the identity function is chosen, then the state of the node is not updated. The key difference to a PBN is the assignment of probabilities that govern which update is chosen. In our setting, each function gets assigned two probabilities. Precisely, let x i be a variable. We assign to it a probability p ↑ i , which determines the likelihood that x i will be updated based on its logical rule, if this update leads to an increase/activation of the variable. Likewise, a probability p ↓ i determines this probability in case the variable is decreased/inhibited. The necessity for considering two different probabilities is that activation and degradation represent different biochemical processes and even if these two are encoded by the same function, their propensities in general are different. This is very similar to what is considered in differential equations modeling, where, for instance, the kinetic rate parameters for activation and for degradation/decay are, in principle, different.
Note that all these approaches only take account of intrinsic noise which is generated from small fluctuations in concentration levels, small number of reactant molecules, and fast and slow reactions. Another source of stochasticity is related to extrinsic noise such as a noisy cellular environment and temperature. For more about intrinsic vs extrinsic noise see [3,23].

Method
Our aim is to model stochasticity at the biological function level under the main assumption that even if the expression levels of the input nodes of an update function guarantee activation or degradation there is a probability that the process will not occur due to stochasticity, for instance, if some of the chemical reactions encoded by the update function may fail to occur. This is similar to models based on the chemical master equation. This model type introduces activation and degradation propensities. More formally, let x 1 , ..., x n be variables which can take values in finite sets X 1 , ..., X n , respectively. Let X = X 1 × ... × X n be the Cartesian product. Thus, the formal definition of a stochastic discrete dynamical system (SDDS) in the variables x 1 , ..., x n is a collection of n triplets We now proceed to study the dynamics of such systems and two specific models as illustration.

Dynamics of SDDS
be a SDDS and consider x X.
That is, if the possible future value of the i-th coordinate is larger (smaller, resp.) than the current value, then the activation (degradation) propensity determines the probability that the i-th coordinate will increase (decrease) its current value. If the i-th coordinate and its possible future value are the same, then the i-th coordinate of the system will maintain its current value with probability 1. Notice that π i, x (x i y i ) = 0 for all The dynamics of F is given by the weighted graph X which has an edge from x X to y X if and only if y i {x i , f i (x)} for all i. The weight of an edge x y is equal to the product By convention we omit edges with weight zero. See Additional file 1 for pseudocodes of algorithms to compute dynamics of SDDS. Software to test examples is available at http://dvd.vbi.vt.edu/adam.html [24] as a web tool (choose SDDS in the model type).
a SDDS, it is straightforward to verify that F has the same steady states (fixed points) as the deterministic system G = {f i } n i=1 (see Additional file 1). It is also important to note that the dynamics of F includes the different trajectories that can be generated from G using other common update mechanisms such as the synchronous and asynchronous schemes (see Additional file 1). Table 4 represents the regulatory rules for x1 and x2 and Table 5  Pr(00 00) = (1)(1) = 1. Figure 1 shows that there is a 9% chance that the system will transition from 01 to 10. Similarly, there is an 81% chance that the system will transition from 01 to 00. The latter was expected because there is a high degradation propensity for f 2 . Note that 00 is a fixed point, i.e., there is 100% chance of staying at this state.

Applications
We illustrate the advantages of this model type by applying it to two widely studied biological systems, the regulation of the p53-mdm2 network and the control of the outcome of phage lambda infection of bacteria. These regulatory networks were selected because stochasticity plays a key role in their dynamics.

Regulation in the p53-Mdm2 network
The p53-Mdm2 network is one of the most widely studied gene regulatory networks. Abou-Jaude et al. [25] proposed a logical four-variable model to describe the dynamics of the tumor suppressor protein p53 and its negative regulator Mdm2 when DNA damage occurs. The wiring diagram of this model is represented in Figure 2, where P denotes cytoplasmic p53, nucleic p53, and the gene p53.
Mc and Mn stand for cytoplasmic Mdm2 and nuclear Mdm2, respectively. DNA damage caused by ionic irradiation decreases the level of nucleic Mdm2 which enables p53 to accumulate and to remain active, playing a key role in reducing the effect of the damage. There is a negative feedback loop involving three components: p53 increases the level of cytoplasmic Mdm2 which, in turn, increases the level of nuclear Mdm2. Nucleic Mdm2 reduces p53 activity. This model also contains a positive feedback loop involving two components where p53 inhibits its negative regulator nucleic Mdm2. Note the dual role of P, as it positively regulates nucleic Mdm2 through cytoplasmic Mdm2. On the other hand, P negatively regulates nucleic Mdm2 by inhibiting Mdm2 nuclear translocation [25]. For more about the p53-Mdm2 system (see [4,25,26]).
The dynamic behavior of the system is represented in a network of transitions called its state space (see Figure 3). This specifies the different paths to follow and the probabilities of following a specific trajectory from a given state. Dynamics here is not deterministic, i.e., most of the state vectors have different trajectories they can follow. The propensity parameters in Table 1 determine the likelihood of following certain paths. The state 0010 is a steady state, which is differentiated from the others by its oval shape.
The  oscillations of the expression level of p53 as the degradation propensities of the damage increases. This is correlated with the fact that, if the intensity of the damage is increased, more cells exhibit oscillations in the level of p53 which was experimentally observed in [4]. The initial state for all simulations was 0011 which represents the state when DNA damage is introduced (0010 is the steady state without perturbation).
To highlight the features of our approach we compare our model with the one presented in [25] in which variability has been analyzed. The main difference between these two models is in the way the simulations are performed. In [25], the transition from one state to the next is determined by parameters called "on" and "off" time delays. For instance, to transition from 2001 to 2101 it is required that t Mc < t dam which means that the "on" delay for Mc (time for activating) is less than the "off" delay (time for degrading) of the damage. Otherwise, if t Mc > t dam the system will transition from 2001 to 2000. In this article, transitions from one state to others are given as probabilities which are determined from the propensity probabilities. Therefore, the complexity of the model presented here is at the level of the wiring diagram (i.e. the number of variables) while the complexity of the model in [25] is at the level of the state space (i.e. number of possible states) which is exponential in the number of variables. Another key difference is the way DNA damage repair is modeled. In [25], a delay parameter t dam is associated with the disappearance of the damage, and this is decreased by a certain amount τ at each iteration so that    Note that there is a low degradation propensity for DNA damage.
iterations. In order to simulate DNA damage with this approach it is required to estimate τ, n, and t (0) dam . Within our model framework a single parameter, the degradation propensity, is used to model the damage repair which is a more natural setup.

Phage lambda infection of bacteria
Control of the outcome of phage lambda infection is one of the best understood regulatory systems [3,27,28]. Figure 6 depicts its core regulatory network that was first modeled by Thieffry and Thomas [28] using a logical approach. This model encompasses the roles of the regulatory genes CI, CRO, CII, and N. From experimental reports [3,[28][29][30] it is known that, if the gene CI is fully expressed, all other genes are off. In the absence of CRO protein, CI is fully expressed (even in the absence of N and CII). CI is fully repressed provided that CRO is active and CII is absent. The dynamics of this network is a bistable switch between lysis and lysogeny, Figure 7. Lysis is the state where the phage will be replicated, killing the host. Otherwise, the network will transition to a state called lysogeny where the phage will incorporate its DNA into the bacterium and become dormant. It has been suggested [28,31] that these cell fate differences are due to spontaneous changes in the timing of individual biochemical reaction events.
The state space for this model is specified by [0, 2] × [0, 3] × [0, 1] × [0, 1], that is, the first variable, CI, has three levels 0, 1, 2, the second variable, CRO, has four levels {0, 1, 2, 3}, and the third and fourth variables, CII and N, are Boolean. Update functions for this model are available in our supporting material, Additional file 1. This model has a steady state, 2000, and a 2-cycle involving 0200 and 0300. The steady state 2000 represents lysogeny where CI is fully expressed while the other genes are off. The cycle between 0200 and 0300 represents lysis where CRO is active and other genes are repressed.
Cell population simulations were performed to measure the cell-to-cell variability. Figure 8 was generated using the probabilities given in Tables 2 (top frame) Table 1. Each subfigure shows oscillations as long as the damage is present. This figure shows variability in the timing of damage repair and in the period of the oscilations. Each frame was generated from a single simulation with sixty time steps. The x-axis represents discrete time steps and the y-axis the expression level. The initial state for all simulations is 0011.
average expression level. The initial state for all simulations was 0000 which represents the state of the bacterium at the moment of phage infection. Figure 8 shows variability in developmental outcome, some of the networks transition to lysis while others transition to lysogeny. To measure how sensitive the dynamics of the network is to changes in the propensity probabilities, we have plotted the outcome of lysis-lysogeny percentages for different choices of these parameters. Figure 9 shows the variation in developmental outcome as a function of the propensity parameters of CI and CRO. Star points indicate the percentage of networks that transition to lysogeny and circle shaped points indicate the percentage of networks that end up in lysis. The bottom x-axis contains activation propensities for CI and degradation propensities for CRO while the top x-axis contains activation propensities for CRO and degradation propensities for CI. The activation and degradation propensities for CII and N were all set equal to .9. Although the probability distributions for CI and CRO are very symmetric in Figure 9, it gives a good idea of how the variability in developmental outcome will change as the propensity parameters change.

Conclusions
Using a discrete modeling strategy, this article introduces a framework to simulate stochasticity in gene  There is a high activation propensity for CI while a low activation propensity for CRO. There is a high activation propensity for CRO while a low activation propensity for CI. regulatory networks at the function level, based on the general concept of PBNs. It accounts for intrinsic noise due to spontaneous differences in timing, small fluctuations in concentration levels, small numbers of reactant molecules, and fast and slow reactions. This framework was tested using two widely studied regulatory networks, the regulation of the p53-Mdm2 network and the control of phage lambda infection of bacteria. It is shown that in both of these examples the use of propensity probabilities for activation and degradation of network nodes provides a natural setup for cell population simulations to study cell-to-cell variability. The new features of this framework are the introduction of activation and degradation propensities that determine how fast or slow the discrete variables are being updated. This provides the ability to generate more realistic simulations of both single cell and cell population dynamics. In the example of the p53-Mdm2 system, one can see that individual simulations show sustained oscillations when DNA damage is present, while at the cell population level these individual oscillations average to a damped oscillation. This agrees with experimental observations [4]. In the second example, l-phage infection of bacteria, it is observed that differences in developmental outcome due to intrinsic noise can be captured with this framework. Due to the lack of experimental data we are unable to calibrate the model so that it reproduces the correct difference in percentages due to intrinsic noise. So instead we present a plot of the difference in CI CII CRO N Figure 6 Wiring diagram for phage lambda infection model. It is worth noting that this article addresses only intrinsic noise generated from small fluctuations in concentration levels, small numbers of reactant molecules, and fast and slow reactions. Extrinsic noise is another source of stochasticity in gene regulation [3,23], and it would be interesting to see if this framework or a similar setup can be adapted to account for extrinsic stochasticity under the discrete approach. This framework also lends itself to the study of intrinsic noise and it is useful for the study of developmental robustness. For instance, one could ask what the effect of this type of noise is on the dynamics of networks controlled by biologically inspired functions.
Relating the propensity parameters to biologically meaningful information or having a systematic way for estimating them is very important. A preliminary analysis shows that it is possible to relate the propensity parameters in this framework with the propensity functions in the Gillespie algorithm under some conditions (see Additional file 1 where for a simple degradation model, the degradation propensity is correlated by a linear equation with the decay rate of the species being degraded). More precisely, in the Gillespie algorithm [13,14], if one discretizes the number of molecules of a chemical species into discrete expression levels such that within these levels the propensity functions for this species do not change significantly, then one obtains the setup of the framework presented here as a discrete model. That is, simulation within the framework presented here can be viewed as a further discretization of the Gillespie algorithm, in a setting that does not require exact knowledge of model parameters. For a similar approach see [10].  Table 2 shows 93% lysis and 7% lysogeny while bottom frame for parameters in Table 3 shows 4% lysis and 96% lysogeny. The x-axis represents discrete time steps while the y-axis shows the average expression level. The initial state for all the simulations is 0000. Solid (circle) points correspond to the average of CI (CRO), and dashed lines represent standard deviations.  Variation in developmental outcome as a function of the propensity parameters. Star points indicate the percentage of networks that transition to lysogeny and circle shaped points indicate the percentage of networks that end up in lysis. Bottom axis represents the activation (and degradation) propensities for CI (CRO) in increasing order. Likewise, the top axis represents the activation (and degradation) propensities for CRO (CI) in decreasing order.